From github.com+25214855+casparcwang at openjdk.java.net Mon Feb 1 01:31:49 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Mon, 1 Feb 2021 01:31:49 GMT Subject: [jdk16] Integrated: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> Message-ID: On Sat, 30 Jan 2021 12:02:25 GMT, ?? wrote: > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > Testing: all Vector API related tests have passed. > > Original pr: https://github.com/openjdk/jdk/pull/2253 This pull request has now been integrated. Changeset: 0fdf9cdd Author: casparcwang Committer: Jie Fu URL: https://git.openjdk.java.net/jdk16/commit/0fdf9cdd Stats: 174 lines in 2 files changed: 165 ins; 0 del; 9 mod 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled Co-authored-by: Stuart Monteith Co-authored-by: Wang Chao Reviewed-by: vlivanov, neliasso ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From github.com+25214855+casparcwang at openjdk.java.net Mon Feb 1 01:45:46 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Mon, 1 Feb 2021 01:45:46 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 16:47:53 GMT, Vladimir Ivanov wrote: >>> > ArrayCopyNode::load performs the same work as it does here in PhaseVector::optimize_vector_boxes . >>> > Is there a need to provide a similar function in PhaseVector or GraphKit? >>> >>> My point is since PhaseVector effectively enters the parsing phase (by signaling about the possibility of post-parse inlining), technically I don't see why `GraphKit::access_load_at` won't work. But I need to spend more time looking into the details. >>> >>> So far, I took a look at the review thread of 8212243 (which introduced `ArrayCopyNode::load`) and found the following discussion between Roland and Erik: >>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-October/030971.html >>> >>> ``` >>> > ... Also it beats me that this is strictly speaking a load barrier for loads performed in >>> > arraycopy. Would it be possible to use something like access_load_at instead? ... >>> ... >>> GraphKit is a parse time only thing. So the existing gc interface >>> doesn't offer any way to add barriers once parsing is over. This code >>> runs after parsing in optimization phases. >>> ... >>> ``` >>> >>> Considering `PhaseVector::optimize_vector_boxes()` already has access to a usable `GraphKit` instance, it is possible that `GraphKit::access_load_at` will "just work". >> >> As far as I can see, during the parse phase, GraphKit contains the jvm state info which can be used to get the control and memory for creating new nodes. But during optimization, the jvm state info may be missing like the situation in `PhaseVector::optimize_vector_boxes` or Macro Expansion. So it should use C2OptAccess to create the Load Node directly by providing control and memory nodes. >> >> I think a similar api like `GraphKit::access_load_at ` should be provided for usage during optimization stages, but where should the API be placed? GraphKit or PhaseIterGVN or somewhere else? > >> As far as I can see, during the parse phase, GraphKit contains the jvm state info which can be used to get the control and memory for creating new nodes. But during optimization, the jvm state info may be missing like the situation in PhaseVector::optimize_vector_boxes or Macro Expansion. > > JVM state is irrelevant here (otherwise, `VectorUnbox` node would have captured relevant info during construction). What is actually missing is `GraphKit` instance lacks info about control and memory. You need to explicitly set it using `GraphKit::set_control()` and `GraphKit::set_all_memory()`. Thanks @iwanowww @neliasso @pliden @stooart-mo @XiaohongGong @fisk @DamonFool for the reviews and helping. The patch has integrated in jdk16 (https://github.com/openjdk/jdk16/pull/139), and this pr should be closed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+25214855+casparcwang at openjdk.java.net Mon Feb 1 01:45:45 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Mon, 1 Feb 2021 01:45:45 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: <6nZPJh_IZbeLrS2D1lrwq7NIIry0zGQ8EzAXD6fkSrE=.4b476693-5877-434e-9e97-b26f73870e33@github.com> Message-ID: On Fri, 29 Jan 2021 16:43:54 GMT, Vladimir Ivanov wrote: > > I suggest you keep this CR as it is since 16 is in rampdown and we need to get approval and push it before Feb 4th (and we do want some margin). > > I agree. @casparcwang, please, file an RFE. Jie Fu @DamonFool has helped to create an RFE. https://bugs.openjdk.java.net/browse/JDK-8260682 ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+25214855+casparcwang at openjdk.java.net Mon Feb 1 01:45:46 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Mon, 1 Feb 2021 01:45:46 GMT Subject: Withdrawn: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 10:05:56 GMT, ?? wrote: > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > > Testing: all Vector API related tests have passed. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From thartmann at openjdk.java.net Mon Feb 1 06:35:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 1 Feb 2021 06:35:42 GMT Subject: RFR: 8260420: C2 compilation fails with assert(found_sfpt) failed: no node in loop that's not input to safepoint [v2] In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 19:27:49 GMT, Vladimir Kozlov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove late ctrl update > > Good. Thanks for the review, Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/2315 From thartmann at openjdk.java.net Mon Feb 1 06:36:44 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 1 Feb 2021 06:36:44 GMT Subject: RFR: 8260577: Unused code in AbstractCompiler after Shark compiler removal [v3] In-Reply-To: <6Dzr-rCwYfZsFCZGf-JhNzlDqyqtQWRziOKTtvFbqNY=.ea8aeb54-eb1c-49e8-b3ed-bfb4d3fb047f@github.com> References: <6Dzr-rCwYfZsFCZGf-JhNzlDqyqtQWRziOKTtvFbqNY=.ea8aeb54-eb1c-49e8-b3ed-bfb4d3fb047f@github.com> Message-ID: On Fri, 29 Jan 2021 19:28:53 GMT, Vladimir Kozlov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "Removed CICompileNatives and CICompileOSR" >> >> This reverts commit a95c358687495ee1ed701cd10d48a3a9c6a45f26. > > Looks good. Thanks Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/2281 From thartmann at openjdk.java.net Mon Feb 1 06:36:45 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 1 Feb 2021 06:36:45 GMT Subject: Integrated: 8260577: Unused code in AbstractCompiler after Shark compiler removal In-Reply-To: References: Message-ID: <6oglD9I7sXAsM6ZxVoAwucgjW6P1CJn5XCWWDyZBX_E=.0323b32d-a65d-4352-8e30-26523a4986b2@github.com> On Thu, 28 Jan 2021 09:41:48 GMT, Tobias Hartmann wrote: > After removal of the Shark compiler with [JDK-8171853](https://bugs.openjdk.java.net/browse/JDK-8171853) in JDK 10, some methods in AbstractCompiler are unused. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 039affc8 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/039affc8 Stats: 19 lines in 4 files changed: 0 ins; 16 del; 3 mod 8260577: Unused code in AbstractCompiler after Shark compiler removal Reviewed-by: shade, chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/2281 From eric.c.liu at arm.com Mon Feb 1 07:21:52 2021 From: eric.c.liu at arm.com (Eric Liu) Date: Mon, 1 Feb 2021 07:21:52 +0000 Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value In-Reply-To: <0f770a11-e235-d2ab-e927-8f192e80638b@redhat.com> References: <813ca59c-59a8-7fca-7566-0472017681ab@redhat.com> <0f770a11-e235-d2ab-e927-8f192e80638b@redhat.com> Message-ID: Hi Andrew, Thanks for your feedback. I refined those benchmarks and on two platforms could have a better performance. One can get about 100% gain, and 30% for another one. Other platforms were hard to see any performance changes. Before: Benchmark Mode Cnt Score Error Units Rotation.andRotateRight avgt 15 3860.994 ? 3.409 ns/op Rotation.bicRotateRight avgt 15 3861.247 ? 3.321 ns/op Rotation.eonRotateRight avgt 15 3860.865 ? 3.003 ns/op Rotation.ornRotateRight avgt 15 3860.884 ? 3.260 ns/op Rotation.xorRotateRight avgt 15 3860.886 ? 2.728 ns/op After: Benchmark Mode Cnt Score Error Units Rotation.andRotateRight avgt 15 1933.495 ? 0.263 ns/op Rotation.bicRotateRight avgt 15 1933.436 ? 0.244 ns/op Rotation.eonRotateRight avgt 15 1933.459 ? 0.255 ns/op Rotation.ornRotateRight avgt 15 1933.559 ? 0.316 ns/op Rotation.xorRotateRight avgt 15 1933.467 ? 0.245 ns/op I would update my patch with the benchmark tests after finishing the whole tests. --Eric -----Original Message----- From: hotspot-compiler-dev On Behalf Of Andrew Haley Sent: Saturday, January 30, 2021 12:30 AM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value On 29/01/2021 10:11, Andrew Haley wrote: > On 1/29/21 8:35 AM, Eric Liu wrote: >> I benchmarked on 7 platforms with jmh test[1], most of them were hard to see any performance changes. On one platform the performance become bad, and seems a little unstable. Your benchmark did not work for me. It did not generate the correct instructions. Please try with this or similar: @Benchmark public void xorRotateRight(MyState s, Blackhole blackhole) { int x = s.xi; int y = s.yi; for (int i = 0; i < COUNT; i++) { y = x ^ ((y >>> 5) | (y << -5)); x = y ^ ((x >>> 5) | (x << -5)); } blackhole.consume(x); } I get: Benchmark Mode Cnt Score Error Units Rotation.xorRotateRight (before) avgt 3 6142.575 ? 15.940 ns/op Rotation.xorRotateRight (after) avgt 3 4081.587 ? 33.904 ns/op Please integrate the corrected benchmark into your patch. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From njian at openjdk.java.net Mon Feb 1 07:51:45 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 1 Feb 2021 07:51:45 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: References: Message-ID: On Sun, 31 Jan 2021 10:30:06 GMT, Dong Bo wrote: >> This is a typo introduced by JDK-8255949. >> Compiler will generate `ushr` for shifting right and accumulating four short integers. >> It produces wrong results for specific case. The instruction should be `usra`. > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > make empty ins_encode when shift >= 16 (chars) Looks good to me. ------------- Marked as reviewed by njian (Committer). PR: https://git.openjdk.java.net/jdk16/pull/136 From thartmann at openjdk.java.net Mon Feb 1 07:58:40 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 1 Feb 2021 07:58:40 GMT Subject: RFR: 8259398: Super word not applied to a loop with byteArrayViewVarHandle In-Reply-To: References: Message-ID: <6BX-WlBPwAXUI8a94MD_Rj0i1ldtyUnDu64SCoW4pfU=.2d96a7c6-00bf-45df-91c4-19b19e7bba97@github.com> On Fri, 29 Jan 2021 17:40:39 GMT, Vladimir Kozlov wrote: > Address expressing in this case has CastII which is not range check related. > I think it is safe to skip any CastII nodes (similar to ConvI2L nodes) when parsing address for vectors - vectors will be constructed only if the same loop's variable and invariant are used for all memory operations regardless casts. Also vectors address depends on loop's variable so they will not be moved outside loop. > In 32-bit VM there is no ConvI2L nodes so I moved CastII checks from under ConvI2L check. > > New regression case added to TestBufferVectorization.java test. > > Testing hs-tier1-7, RenaissanceStressTest. Run all vectorizing tests locally to make sure no regression. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2317 From dongbo4 at huawei.com Mon Feb 1 08:11:35 2021 From: dongbo4 at huawei.com (dongbo (E)) Date: Mon, 1 Feb 2021 16:11:35 +0800 Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: <6a306bd0-19a9-f8d2-a285-1239371f389e@redhat.com> References: <6a306bd0-19a9-f8d2-a285-1239371f389e@redhat.com> Message-ID: On 2021/1/31 19:49, Andrew Haley wrote: > On 1/31/21 10:34 AM, Dong Bo wrote: >> This was wrong, both src and dst should have the same value as before. >> Actually, when the shift is `>= 16`, the URShift is optimized to zero by the compiler. >> So we don't have a `vsrla4S_imm` match if `shift >= 16`, the wrong `eor` is not generated. >> Check the assembly code of the following test: >> # test >> public void shiftURightAccumulateChar() { >> for (int i = 0; i < count; i++) { >> charsD[i] = (char) (charsA[i] + (charsB[i] >>> 16)); >> } >> } > We need to make sure this is in a regression test. Also, please make > sure that a shift if e.g. 35 works correctly. Hi, Added regresstion tests for serveral shift counts. The tests passed with the newest version and still can catch the typo. Also tested a few of injected errors, the tests failed as expected. From dongbo at openjdk.java.net Mon Feb 1 08:12:58 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 1 Feb 2021 08:12:58 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v4] In-Reply-To: References: Message-ID: > This is a typo introduced by JDK-8255949. > Compiler will generate `ushr` for shifting right and accumulating four short integers. > It produces wrong results for specific case. The instruction should be `usra`. Dong Bo has updated the pull request incrementally with two additional commits since the last revision: - fix trailing whitespace - add tests for shifting counts ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/136/files - new: https://git.openjdk.java.net/jdk16/pull/136/files/b7ef8fb8..f2e490a3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=02-03 Stats: 286 lines in 1 file changed: 245 ins; 6 del; 35 mod Patch: https://git.openjdk.java.net/jdk16/pull/136.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/136/head:pull/136 PR: https://git.openjdk.java.net/jdk16/pull/136 From roland at openjdk.java.net Mon Feb 1 08:18:41 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 1 Feb 2021 08:18:41 GMT Subject: RFR: 8260420: C2 compilation fails with assert(found_sfpt) failed: no node in loop that's not input to safepoint [v2] In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 14:59:14 GMT, Tobias Hartmann wrote: >> Loop strip mining verification fails because a `LoadNode` with no safepoint use ends up in the `OuterStripMinedLoop`. The root cause is very similar to [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607): A `LoadNode` with two uses (a field store and a return, see `test2`) is cloned by `PhaseIdealLoop::split_if_with_blocks_post()` for each use, to allow it to flow out of the loop. Both clones end up in the `OuterStripMinedLoop`, the clone for the field store because the store has a safepoint use and the clone for the return because `late_load_ctrl` is too conservative by taking all initial uses into account. >> >> Now the load without a safepoint use is always detected by loop strip mining verification but without verification (for example, in a product build), we still hit several different issues depending on the exact use of the load (compare `test2/3/4`). The main issue is that loads without a safepoint use are not correctly wired when creating pre/main/post loops. Christian described this well in his RFR for [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039638.html >> >> Therefore, relaxing loop strip mining verification is not an option. >> >> The problem with the fix for [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607) is that it relies on IGVN being executed before pre/main/post loops are created, merging the two clones such that the remaining one has a safepoint use and is therefore correctly handled. However, this is not guaranteed. In fact, if `major_progress` is not set, pre/main/post loops could be created in the same round of loop opts without IGVN in-between (`IdealLoopTree::iteration_split` is executed right after `PhaseIdealLoop::split_if_with_blocks`). >> >> I think we have the following options: >> 1) Bail out if the loop is strip mined. I think that would be too conservative. >> >> 2) Simply set `major_progress` when pinning a `LoadNode` to the `OuterStripMinedLoop` to make sure IGVN is executed and duplicate `LoadNodes` are merged. That seems quite invasive though. >> >> 3) Allow the cloned load without a safepoint use (and therefore no usages in the `OuterStripMinedLoop`) to completely flow out of the loop. Currently, this is blocked by `late_load_ctrl` being computed on the initial `LoadNode` instead of the clone and therefore also taking into account the store that is referenced by the safepoint. We could simply re-compute the late ctrl for the cloned load, allowing it to flow out of the `OuterStripMinedLoop`. However, that only works if there is no anti-dependency in the `OuterStripMinedLoop`. >> >> 4) Detect duplicate loads in the `OuterStripMinedLoop` on creation in `PhaseIdealLoop::split_if_with_blocks` and merge them right away. >> >> I've decided to go with 4) and also reverted the fix for JDK-8249607 which is no longer required. >> >> As Roland and Christian already noticed, there are various other issues with that code that hopefully will be addressed at some point by [JDK-8252372](https://bugs.openjdk.java.net/browse/JDK-8252372). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Remove late ctrl update Changes requested by roland (Reviewer). src/hotspot/share/opto/loopopts.cpp line 1516: > 1514: // address expression) and the AddP and StoreP have > 1515: // different controls. > 1516: if (!x->is_Load() && !x->is_DecodeNarrowPtr()) { why this change? ------------- PR: https://git.openjdk.java.net/jdk/pull/2315 From ogatak at openjdk.java.net Mon Feb 1 08:28:06 2021 From: ogatak at openjdk.java.net (Kazunori Ogata) Date: Mon, 1 Feb 2021 08:28:06 GMT Subject: RFR: 8259822: [PPC64] Support the prefixed instruction format added in POWER10 [v3] In-Reply-To: References: Message-ID: > The POWER10 processor, which implements Power ISA 3.1 [1], supports new instruction formats where an instruction takes two 32bit words. The first word is called prefix, and the instructions with prefix are called prefixed instructions. With more bits in opcode and operand fields, POWER10 supports larger immediate value in an operand, as well as many new instructions. > > This is the first changes to handle prefixed instructions, and this adds support of prefixed addi (= paddi) instruction as an example of prefix usage. paddi accepts 34bit immediate value, while original addi accepts 16bit value. > > [1] https://ibm.ent.box.com/s/hhjfw0x0lrbtyzmiaffnbxh2fuo0fog0 Kazunori Ogata has updated the pull request incrementally with one additional commit since the last revision: Update (2nd round) based on review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2095/files - new: https://git.openjdk.java.net/jdk/pull/2095/files/9a7ef64e..a8770539 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2095&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2095&range=01-02 Stats: 32 lines in 3 files changed: 6 ins; 23 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2095.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2095/head:pull/2095 PR: https://git.openjdk.java.net/jdk/pull/2095 From ogatak at openjdk.java.net Mon Feb 1 08:32:40 2021 From: ogatak at openjdk.java.net (Kazunori Ogata) Date: Mon, 1 Feb 2021 08:32:40 GMT Subject: RFR: 8259822: [PPC64] Support the prefixed instruction format added in POWER10 [v3] In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 01:53:02 GMT, Corey Ashford wrote: >> Kazunori Ogata has updated the pull request incrementally with one additional commit since the last revision: >> >> Update (2nd round) based on review comments > > This looks good overall. I'm looking forward to being able to utilize this capability. @CoreyAshford Thank you for your comment. Regarding the unused code, I agree that unused code won't be tested enough, so I removed paddi_or_addi(), pli_or_li(), and is_pli(). For the comment on predicate in loadConI32 and loadConL34, I couldn't find the reason why it caused build error. So I added comments describing they are only for Power 10 and up but can't add predicate. I also added comments in is_paddi() as you suggested. ------------- PR: https://git.openjdk.java.net/jdk/pull/2095 From redestad at openjdk.java.net Mon Feb 1 08:32:57 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 1 Feb 2021 08:32:57 GMT Subject: RFR: 8260605: Various java.lang.invoke cleanups [v3] In-Reply-To: <_nKSvOkUmpblUv3wP4_NMR8FnOEIWbifvs6SyJuV4ao=.642878bb-b8c5-4051-9177-e500217fe0a6@github.com> References: <_nKSvOkUmpblUv3wP4_NMR8FnOEIWbifvs6SyJuV4ao=.642878bb-b8c5-4051-9177-e500217fe0a6@github.com> Message-ID: > - Remove unused code > - Inline and simplify the bootstrap method invocation code (remove pointless reboxing checks etc) > - Apply pattern matching to make some code more readable Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: More cleanup, reduce allocations in InvokerBytecodeGenerator ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2300/files - new: https://git.openjdk.java.net/jdk/pull/2300/files/68d3475a..aa88b6fd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2300&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2300&range=01-02 Stats: 45 lines in 1 file changed: 19 ins; 6 del; 20 mod Patch: https://git.openjdk.java.net/jdk/pull/2300.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2300/head:pull/2300 PR: https://git.openjdk.java.net/jdk/pull/2300 From rcastanedalo at openjdk.java.net Mon Feb 1 08:38:55 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 1 Feb 2021 08:38:55 GMT Subject: RFR: 8260581: IGV: enhance node search [v3] In-Reply-To: References: Message-ID: <99IPHcis01UPgM4dVdz0VQ9zhVBAwuOJOUl9APAhjOQ=.4746a6b2-95ab-4605-b567-d45354ba48bb@github.com> > Apply several enhancements to the quick node search functionality: > > - Allow users to search by node id or name by default (i.e. when no property is specified) instead of name only. > - Show partial matches when searching for a specific property (e.g. so that searching "type=con" lists all "control"-type nodes). > - Avoid showing the "All _N_ matching nodes" entry if there is a single match, or the user is searching a numeric value. > - Rank matches so that full matches are listed first, followed by matches at the beginning of the partially matched value, followed by the rest of matches in increasing size of the partially matched value. For example, searching "5" on a set of nodes with labels {"5 AddI", "25 AddL", "253 AddL", "554 MulI"} should list the matches as follows: > 1. **5** AddI > 2. **5**54 MulI > 3. 2**5** AddL > 4. 2**5**3 AddL > > As an illustration of some of these enhancements, this screenshot shows the behavior of the quick node search functionality before the changes: > > ![search-before](https://user-images.githubusercontent.com/8792647/106283438-374ba500-6242-11eb-8ef4-d18117eabcbb.png) > > and after: > > ![search-after](https://user-images.githubusercontent.com/8792647/106282880-7e856600-6241-11eb-8cb5-48fae5582cc2.png) > > > Tested manually on small and large (~10000 nodes) graphs. Thanks to Christian Hagedorn for feedback on several iterations of the enhancements. > > As part of the review, please evaluate not just the code changes but also the usability. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Use Integer.compare() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2285/files - new: https://git.openjdk.java.net/jdk/pull/2285/files/73d0ed02..537d023b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2285&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2285&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2285.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2285/head:pull/2285 PR: https://git.openjdk.java.net/jdk/pull/2285 From rcastanedalo at openjdk.java.net Mon Feb 1 08:38:56 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 1 Feb 2021 08:38:56 GMT Subject: RFR: 8260581: IGV: enhance node search [v2] In-Reply-To: References: <1OvXkUZq1-wp42Ik4uO1Po77am92EVM99S9JWB4mf4I=.32748364-a89f-4002-b7a7-378ca2623422@github.com> Message-ID: On Fri, 29 Jan 2021 15:11:40 GMT, Christian Hagedorn wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove debug print > > src/utils/IdealGraphVisualizer/View/src/com/sun/hotspot/igv/view/NodeQuickSearch.java line 140: > >> 138: (InputNode a, InputNode b) -> >> 139: Integer.valueOf(rankMatch(rawValue, a.getProperties().get(name))) >> 140: .compareTo(rankMatch(rawValue, b.getProperties().get(name)))); > > You can use `Integer.compare()` which directly takes two `ints` instead of `compareTo()`. Thanks, done! ------------- PR: https://git.openjdk.java.net/jdk/pull/2285 From rcastanedalo at openjdk.java.net Mon Feb 1 08:41:43 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 1 Feb 2021 08:41:43 GMT Subject: RFR: 8260581: IGV: enhance node search [v2] In-Reply-To: References: <1OvXkUZq1-wp42Ik4uO1Po77am92EVM99S9JWB4mf4I=.32748364-a89f-4002-b7a7-378ca2623422@github.com> Message-ID: On Fri, 29 Jan 2021 15:36:11 GMT, Vladimir Ivanov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove debug print > > Awesome! Finally! Thanks Christian and Vladimir for reviewing! I will wait a couple more days in case anyone else has comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/2285 From jrose at openjdk.java.net Mon Feb 1 08:57:43 2021 From: jrose at openjdk.java.net (John R Rose) Date: Mon, 1 Feb 2021 08:57:43 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v3] In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 08:32:42 GMT, Roland Westrelin wrote: >> I did a first pass over the changes and added some comments but I need more time to review. > > @TobiHartmann thanks for the comments. I pushed a change that should address them. Partial review: Yes, good cleanups along with great new functionality for big data! I like the refactorings which use the new generic factory methods from JDK-8255150. The comment in `PhaseIdealLoop::transform_long_range_checks` is very good. PhaseTransform::integercon has a bug left over from JDK-8256655, the cast in `assert(((long)int_con) == l, "not an int")` is wrong. Just use `intcon(checked_cast(l))` as you already do elsewhere. (I also noticed also that `PhaseIdealLoop::exact_limit` and `LoopLimitNode::value` have similar asserts which could be replaced with `checked_cast`. In fact, `PhaseIdealLoop::exact_limit` is a duplicated from `LoopLimitNode::value` just to avoid creating a node, yuck. That fiddly stuff should have been written as few times as possible. I suggest putting it into a static inline in the class `LoopLimitNode`, where it will also serve as documentation.) typo: /had a change to be hoisted yet/s/change/chance/ Also there are unusually long comment lines there and inside `transform_long_range_checks`. 130 columns plus is longer than my widest laptop window holds unless I aggravate my presbyopia with a smaller font size. Maybe, consider reflowing? I know some of those lines are intrinsically long. This variable is unused: `Node* long_stride = head->stride();` The argument `signed_int` is unused in `MulNode::operates_on`. Well done adapting `is_scaled_iv` and friends to T_LONG! The logic for turning a shift amount into a scale factor looks flawed, at least if `shift_amount` is ever out of range for the corresponding basic type. I suggest being safe rather than sorry: `shift_amount &= (bt == T_INT ? 31 : 63); //clip` (There are about 3 possible bugs with out-of-range shifts.) Even better: Use both overloadings of `java_shift_left` from `globalDefinitions.hpp`; I think that would be the best practice. > I realize there's some code duplication but I didn't see a way to share logic between IdealLoopTree::may_have_range_check() IdealLoopTree::policy_range_check() that would feel right. I did a direct `diff -w` of the two function bodies and they are so similar that I think it is better to maintain them as one copy of the code. I suggest adding a boolean flag `provisional` that activates the changed bits of logic: bool IdealLoopTree::policy_range_check(PhaseIdealLoop *phase, bool provisional = false) const { + if (_head->is_CountedLoop()) { CountedLoopNode *cl = _head->as_CountedLoop(); ...existing stuff about unroll_only etc... + } + BaseCountedLoopNode *cl = _head->as_BaseCountedLoop(); Node *trip_counter = cl->phi(); + BasicType bt = cl->bt(); for (...) { ... Node *cmp = bol->in(1); Node *rc_exp = cmp->in(1); Node *limit = cmp->in(2); + if (provisional) { + // Try to pattern match with either cmp inputs, do not check whether one of the + // inputs is loop independent as it may not have had a chance to be hoisted yet. + if (!phase->is_scaled_iv_plus_offset(rc_exp, trip_counter, NULL, NULL, bt) && + !phase->is_scaled_iv_plus_offset(limit, trip_counter, NULL, NULL, bt)) { continue; + } else { // check loop independence if non-provisional Node *limit_c = phase->get_ctrl(limit); ...existing checks & swap of rc_exp and limit... if (!phase->is_scaled_iv_plus_offset(rc_exp, trip_counter, NULL, NULL)) { continue; } + } if (is_loop_exit(iff)) { // Found valid reason to split iterations (if there is room). // NOTE: Usually a gross overestimate. return provisional || phase->may_require_nodes(est_loop_clone_sz(2)); } } } Now for the biggest part: You are replacing long range checks with 32-bit range checks which then can be RCE-ed. You dial back the strip-mining count as needed to make sure the various scaled index computations don't suffer overflow. Very clever! I suggest placing `extract_long_range_checks` before `transform_long_range_checks` because that's the order they are executed. That way the two can be read more easily in sequence. The variable `jlong stride_con` should be `int`. It is never used for anything bigger, or else an assert would fire. IMO the expression `scale > 0 != stride_con > 0` needs parens around each of the two comparisons. This is a case of relying on obscure C language precedence rules. (I had to look it up myself!) This group of definitions is loop-invariant, and should probably be set up once outside of the for-loop: // Compute lower and upper Node* last = new LoopLimitNode(C, int_zero, inner_iters_actual_int, int_stride); register_new_node(last, entry_control); last = new SubINode(last, int_stride); register_new_node(last, entry_control); Node* lower = outer_phi; Node* upper = new ConvI2LNode(last); register_new_node(upper, entry_control); upper = new AddLNode(lower, upper); register_new_node(upper, entry_control); In `new AddLNode(upper, long_one)`, I think `long_one` should have the same sign as the stride. But I'm not sure (yet). You have an expression for accumulating the maximum scale: `max_scale = MAX2(max_scale, scale)`. I think that the accumulated value should not be negative, so you probably want `max_scale = MAX2(max_scale, ABS(scale))`. But, I suggest getting rid of `max_scale` altogether. The logic will be simpler if it works like this: jlong original_iters_limit = iters_limit; jlong reduced_iters_limit = iters_limit; for (...) { ... jlong new_limit = original_iters_limit / ABS(scale * stride_con) if (new_limit >= min_iters) { // ??? or > ??? reduced_iters_limit = MIN2(reduced_iters_limit, new_limit); range_checks.push(c); } } return checked_cast(reduced_iters_limit); If you do this you won't need that delicate assert at the end. More later. I'm still working through the core logic in `transform_long_range_checks`, and I will propose more changes there. ------------- PR: https://git.openjdk.java.net/jdk/pull/2045 From thartmann at openjdk.java.net Mon Feb 1 08:57:54 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 1 Feb 2021 08:57:54 GMT Subject: RFR: 8260420: C2 compilation fails with assert(found_sfpt) failed: no node in loop that's not input to safepoint [v2] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 08:16:04 GMT, Roland Westrelin wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove late ctrl update > > src/hotspot/share/opto/loopopts.cpp line 1516: > >> 1514: // address expression) and the AddP and StoreP have >> 1515: // different controls. >> 1516: if (!x->is_Load() && !x->is_DecodeNarrowPtr()) { > > why this change? This reverts Christian's fix for [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607) which is no longer required. ------------- PR: https://git.openjdk.java.net/jdk/pull/2315 From roland at openjdk.java.net Mon Feb 1 09:07:41 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 1 Feb 2021 09:07:41 GMT Subject: RFR: 8260420: C2 compilation fails with assert(found_sfpt) failed: no node in loop that's not input to safepoint [v2] In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 14:59:14 GMT, Tobias Hartmann wrote: >> Loop strip mining verification fails because a `LoadNode` with no safepoint use ends up in the `OuterStripMinedLoop`. The root cause is very similar to [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607): A `LoadNode` with two uses (a field store and a return, see `test2`) is cloned by `PhaseIdealLoop::split_if_with_blocks_post()` for each use, to allow it to flow out of the loop. Both clones end up in the `OuterStripMinedLoop`, the clone for the field store because the store has a safepoint use and the clone for the return because `late_load_ctrl` is too conservative by taking all initial uses into account. >> >> Now the load without a safepoint use is always detected by loop strip mining verification but without verification (for example, in a product build), we still hit several different issues depending on the exact use of the load (compare `test2/3/4`). The main issue is that loads without a safepoint use are not correctly wired when creating pre/main/post loops. Christian described this well in his RFR for [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039638.html >> >> Therefore, relaxing loop strip mining verification is not an option. >> >> The problem with the fix for [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607) is that it relies on IGVN being executed before pre/main/post loops are created, merging the two clones such that the remaining one has a safepoint use and is therefore correctly handled. However, this is not guaranteed. In fact, if `major_progress` is not set, pre/main/post loops could be created in the same round of loop opts without IGVN in-between (`IdealLoopTree::iteration_split` is executed right after `PhaseIdealLoop::split_if_with_blocks`). >> >> I think we have the following options: >> 1) Bail out if the loop is strip mined. I think that would be too conservative. >> >> 2) Simply set `major_progress` when pinning a `LoadNode` to the `OuterStripMinedLoop` to make sure IGVN is executed and duplicate `LoadNodes` are merged. That seems quite invasive though. >> >> 3) Allow the cloned load without a safepoint use (and therefore no usages in the `OuterStripMinedLoop`) to completely flow out of the loop. Currently, this is blocked by `late_load_ctrl` being computed on the initial `LoadNode` instead of the clone and therefore also taking into account the store that is referenced by the safepoint. We could simply re-compute the late ctrl for the cloned load, allowing it to flow out of the `OuterStripMinedLoop`. However, that only works if there is no anti-dependency in the `OuterStripMinedLoop`. >> >> 4) Detect duplicate loads in the `OuterStripMinedLoop` on creation in `PhaseIdealLoop::split_if_with_blocks` and merge them right away. >> >> I've decided to go with 4) and also reverted the fix for JDK-8249607 which is no longer required. >> >> As Roland and Christian already noticed, there are various other issues with that code that hopefully will be addressed at some point by [JDK-8252372](https://bugs.openjdk.java.net/browse/JDK-8252372). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Remove late ctrl update Marked as reviewed by roland (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2315 From roland at openjdk.java.net Mon Feb 1 09:07:43 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 1 Feb 2021 09:07:43 GMT Subject: RFR: 8260420: C2 compilation fails with assert(found_sfpt) failed: no node in loop that's not input to safepoint [v2] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 08:54:43 GMT, Tobias Hartmann wrote: >> src/hotspot/share/opto/loopopts.cpp line 1516: >> >>> 1514: // address expression) and the AddP and StoreP have >>> 1515: // different controls. >>> 1516: if (!x->is_Load() && !x->is_DecodeNarrowPtr()) { >> >> why this change? > > This reverts Christian's fix for [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607) which is no longer required. Ok. ------------- PR: https://git.openjdk.java.net/jdk/pull/2315 From thartmann at openjdk.java.net Mon Feb 1 09:13:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 1 Feb 2021 09:13:42 GMT Subject: RFR: 8260420: C2 compilation fails with assert(found_sfpt) failed: no node in loop that's not input to safepoint [v2] In-Reply-To: References: Message-ID: <8UPWuUAvL3fek6f7PmX_oJ-ycGw4jWyy-Vy2FZDt-u4=.632b5304-18ba-4ab5-86ee-5569acb64d5b@github.com> On Mon, 1 Feb 2021 09:05:23 GMT, Roland Westrelin wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove late ctrl update > > Marked as reviewed by roland (Reviewer). Thanks for the review, Roland! ------------- PR: https://git.openjdk.java.net/jdk/pull/2315 From chagedorn at openjdk.java.net Mon Feb 1 09:25:52 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 1 Feb 2021 09:25:52 GMT Subject: RFR: 8260420: C2 compilation fails with assert(found_sfpt) failed: no node in loop that's not input to safepoint [v2] In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 14:59:14 GMT, Tobias Hartmann wrote: >> Loop strip mining verification fails because a `LoadNode` with no safepoint use ends up in the `OuterStripMinedLoop`. The root cause is very similar to [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607): A `LoadNode` with two uses (a field store and a return, see `test2`) is cloned by `PhaseIdealLoop::split_if_with_blocks_post()` for each use, to allow it to flow out of the loop. Both clones end up in the `OuterStripMinedLoop`, the clone for the field store because the store has a safepoint use and the clone for the return because `late_load_ctrl` is too conservative by taking all initial uses into account. >> >> Now the load without a safepoint use is always detected by loop strip mining verification but without verification (for example, in a product build), we still hit several different issues depending on the exact use of the load (compare `test2/3/4`). The main issue is that loads without a safepoint use are not correctly wired when creating pre/main/post loops. Christian described this well in his RFR for [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039638.html >> >> Therefore, relaxing loop strip mining verification is not an option. >> >> The problem with the fix for [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607) is that it relies on IGVN being executed before pre/main/post loops are created, merging the two clones such that the remaining one has a safepoint use and is therefore correctly handled. However, this is not guaranteed. In fact, if `major_progress` is not set, pre/main/post loops could be created in the same round of loop opts without IGVN in-between (`IdealLoopTree::iteration_split` is executed right after `PhaseIdealLoop::split_if_with_blocks`). >> >> I think we have the following options: >> 1) Bail out if the loop is strip mined. I think that would be too conservative. >> >> 2) Simply set `major_progress` when pinning a `LoadNode` to the `OuterStripMinedLoop` to make sure IGVN is executed and duplicate `LoadNodes` are merged. That seems quite invasive though. >> >> 3) Allow the cloned load without a safepoint use (and therefore no usages in the `OuterStripMinedLoop`) to completely flow out of the loop. Currently, this is blocked by `late_load_ctrl` being computed on the initial `LoadNode` instead of the clone and therefore also taking into account the store that is referenced by the safepoint. We could simply re-compute the late ctrl for the cloned load, allowing it to flow out of the `OuterStripMinedLoop`. However, that only works if there is no anti-dependency in the `OuterStripMinedLoop`. >> >> 4) Detect duplicate loads in the `OuterStripMinedLoop` on creation in `PhaseIdealLoop::split_if_with_blocks` and merge them right away. >> >> I've decided to go with 4) and also reverted the fix for JDK-8249607 which is no longer required. >> >> As Roland and Christian already noticed, there are various other issues with that code that hopefully will be addressed at some point by [JDK-8252372](https://bugs.openjdk.java.net/browse/JDK-8252372). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Remove late ctrl update Nice summary! I agree that is better to go with 4) and try option 3) in a general clean up of this code (JDK-8252372). Looks good to me! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2315 From thartmann at openjdk.java.net Mon Feb 1 09:36:43 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 1 Feb 2021 09:36:43 GMT Subject: RFR: 8260420: C2 compilation fails with assert(found_sfpt) failed: no node in loop that's not input to safepoint [v2] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 09:22:52 GMT, Christian Hagedorn wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove late ctrl update > > Nice summary! I agree that is better to go with 4) and try option 3) in a general clean up of this code (JDK-8252372). Looks good to me! Thanks Christian! ------------- PR: https://git.openjdk.java.net/jdk/pull/2315 From chagedorn at openjdk.java.net Mon Feb 1 09:43:44 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 1 Feb 2021 09:43:44 GMT Subject: Integrated: 8257498: Remove useless skeleton predicates In-Reply-To: <6dUhcj2h_EaBfFa41hTfaiGtSNRPXG4HFSEhQq8MGaw=.3a769ef8-5d0a-4264-9e08-e2598f6974bf@github.com> References: <6dUhcj2h_EaBfFa41hTfaiGtSNRPXG4HFSEhQq8MGaw=.3a769ef8-5d0a-4264-9e08-e2598f6974bf@github.com> Message-ID: On Thu, 14 Jan 2021 08:14:18 GMT, Christian Hagedorn wrote: > This enhancement removes useless skeleton predicates in the same way as we already remove normal useless predicates in `PhaseIdealLoop::eliminate_useless_predicates()`. > > Thanks, > Christian This pull request has now been integrated. Changeset: aec03772 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/aec03772 Stats: 101 lines in 6 files changed: 66 ins; 16 del; 19 mod 8257498: Remove useless skeleton predicates Reviewed-by: roland, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2075 From redestad at openjdk.java.net Mon Feb 1 10:00:02 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 1 Feb 2021 10:00:02 GMT Subject: RFR: 8260605: Various java.lang.invoke cleanups [v4] In-Reply-To: <_nKSvOkUmpblUv3wP4_NMR8FnOEIWbifvs6SyJuV4ao=.642878bb-b8c5-4051-9177-e500217fe0a6@github.com> References: <_nKSvOkUmpblUv3wP4_NMR8FnOEIWbifvs6SyJuV4ao=.642878bb-b8c5-4051-9177-e500217fe0a6@github.com> Message-ID: <1gjeWjb-j-3CQq9cCgNw82cScyMdqXoSL7bWBgfrjEQ=.31e19c6b-56cf-45ae-b83d-0894c39c1d77@github.com> > - Remove unused code > - Inline and simplify the bootstrap method invocation code (remove pointless reboxing checks etc) > - Apply pattern matching to make some code more readable Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Missing .values ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2300/files - new: https://git.openjdk.java.net/jdk/pull/2300/files/aa88b6fd..0e3768b8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2300&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2300&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2300.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2300/head:pull/2300 PR: https://git.openjdk.java.net/jdk/pull/2300 From chagedorn at openjdk.java.net Mon Feb 1 10:40:42 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 1 Feb 2021 10:40:42 GMT Subject: RFR: 8259398: Super word not applied to a loop with byteArrayViewVarHandle In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 17:40:39 GMT, Vladimir Kozlov wrote: > Address expressing in this case has CastII which is not range check related. > I think it is safe to skip any CastII nodes (similar to ConvI2L nodes) when parsing address for vectors - vectors will be constructed only if the same loop's variable and invariant are used for all memory operations regardless casts. Also vectors address depends on loop's variable so they will not be moved outside loop. > In 32-bit VM there is no ConvI2L nodes so I moved CastII checks from under ConvI2L check. > > New regression case added to TestBufferVectorization.java test. > > Testing hs-tier1-7, RenaissanceStressTest. Run all vectorizing tests locally to make sure no regression. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2317 From github.com+10482586+therealeliu at openjdk.java.net Mon Feb 1 11:14:11 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Mon, 1 Feb 2021 11:14:11 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v2] In-Reply-To: References: Message-ID: > This patch transforms '(x >>> rshift) + (x << lshift)' into > 'RotateRight(x, rshift)' during GVN phase when both the shift exponents > are constants and their sum equals to the number of bits for the type > of shift base. > > This patch implements some new match rules for AArch64 instructions > which can take ROR as the optional shift. Such instructions are 'and', > 'or', 'eor', 'eon', 'bic' and 'orn'. > > ror w11, w2, #5 > eor w0, w1, w11 > > With this patch, above code could be optimized to below: > > eor w0, w1, w2, ror #5 > > Finally, the patch refactors TestRotate.java[1][2]. > > Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, > jdk::jdk_core, langtools::tier1. > > [1] https://bugs.openjdk.java.net/browse/JDK-8252776 > [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039911.html Eric Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into 8256438 Change-Id: Ia357074efc8488a57030863b3eab7b27839cd3d0 - 8256438: AArch64: Implement match rules with ROR shift register value This patch transforms '(x >>> rshift) + (x << lshift)' into 'RotateRight(x, rshift)' during GVN phase when both the shift exponents are constants and their sum equals to the number of bits for the type of shift base. This patch implements some new match rules for AArch64 instructions which can take ROR as the optional shift. Such instructions are 'and', 'or', 'eor', 'eon', 'bic' and 'orn'. ror w11, w2, #5 eor w0, w1, w11 With this patch, above code could be optimized to below: eor w0, w1, w2, ror #5 Finally, the patch refactors TestRotate.java[1][2]. Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, jdk::jdk_core, langtools::tier1. [1] https://bugs.openjdk.java.net/browse/JDK-8252776 [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039911.html Change-Id: I70842bcdb7cbc31bdf261c3223ea882076c2c66b ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1858/files - new: https://git.openjdk.java.net/jdk/pull/1858/files/6135975d..afc68c27 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1858&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1858&range=00-01 Stats: 116364 lines in 2912 files changed: 46136 ins; 41070 del; 29158 mod Patch: https://git.openjdk.java.net/jdk/pull/1858.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1858/head:pull/1858 PR: https://git.openjdk.java.net/jdk/pull/1858 From github.com+10482586+therealeliu at openjdk.java.net Mon Feb 1 11:20:20 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Mon, 1 Feb 2021 11:20:20 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v3] In-Reply-To: References: Message-ID: > This patch transforms '(x >>> rshift) + (x << lshift)' into > 'RotateRight(x, rshift)' during GVN phase when both the shift exponents > are constants and their sum equals to the number of bits for the type > of shift base. > > This patch implements some new match rules for AArch64 instructions > which can take ROR as the optional shift. Such instructions are 'and', > 'or', 'eor', 'eon', 'bic' and 'orn'. > > ror w11, w2, #5 > eor w0, w1, w11 > > With this patch, above code could be optimized to below: > > eor w0, w1, w2, ror #5 > > Finally, the patch refactors TestRotate.java[1][2]. > > Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, > jdk::jdk_core, langtools::tier1. > > [1] https://bugs.openjdk.java.net/browse/JDK-8252776 > [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039911.html Eric Liu has updated the pull request incrementally with one additional commit since the last revision: Add benchmark test Change-Id: I63ca51d06070a07e5c20daf4b42d2c8d7237a1da ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1858/files - new: https://git.openjdk.java.net/jdk/pull/1858/files/afc68c27..492f4ca4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1858&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1858&range=01-02 Stats: 110 lines in 3 files changed: 108 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1858.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1858/head:pull/1858 PR: https://git.openjdk.java.net/jdk/pull/1858 From rcastanedalo at openjdk.java.net Mon Feb 1 11:34:59 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 1 Feb 2021 11:34:59 GMT Subject: RFR: 8260581: IGV: enhance node search [v4] In-Reply-To: References: Message-ID: > Apply several enhancements to the quick node search functionality: > > - Allow users to search by node id or name by default (i.e. when no property is specified) instead of name only. > - Show partial matches when searching for a specific property (e.g. so that searching "type=con" lists all "control"-type nodes). > - Avoid showing the "All _N_ matching nodes" entry if there is a single match, or the user is searching a numeric value. > - Rank matches so that full matches are listed first, followed by matches at the beginning of the partially matched value, followed by the rest of matches in increasing size of the partially matched value. For example, searching "5" on a set of nodes with labels {"5 AddI", "25 AddL", "253 AddL", "554 MulI"} should list the matches as follows: > 1. **5** AddI > 2. **5**54 MulI > 3. 2**5** AddL > 4. 2**5**3 AddL > > As an illustration of some of these enhancements, this screenshot shows the behavior of the quick node search functionality before the changes: > > ![search-before](https://user-images.githubusercontent.com/8792647/106283438-374ba500-6242-11eb-8ef4-d18117eabcbb.png) > > and after: > > ![search-after](https://user-images.githubusercontent.com/8792647/106282880-7e856600-6241-11eb-8cb5-48fae5582cc2.png) > > > Tested manually on small and large (~10000 nodes) graphs. Thanks to Christian Hagedorn for feedback on several iterations of the enhancements. > > As part of the review, please evaluate not just the code changes but also the usability. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Sort same-rank matches by first-word numeric value Sort otherwise equally relevant matches by node id, which is by default the first word in node labels. Thanks to Christian Hagedorn for the suggestion and (slightly adapted) patch. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2285/files - new: https://git.openjdk.java.net/jdk/pull/2285/files/537d023b..981e9038 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2285&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2285&range=02-03 Stats: 24 lines in 1 file changed: 22 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2285.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2285/head:pull/2285 PR: https://git.openjdk.java.net/jdk/pull/2285 From vlivanov at openjdk.java.net Mon Feb 1 11:38:49 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 1 Feb 2021 11:38:49 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> Message-ID: On Sun, 31 Jan 2021 00:41:11 GMT, Jie Fu wrote: > compileonly and compilercount=1 will let the VM run slow enough to wait for a gc to be finished. That's a strange way to provoke the bug. You could just increase the number of iterations instead. But the right way to fix it is to stress ZGC to continuously run in the background while the test case aggressively unboxes vectors in compiled code. `-Xmx256m` helps with that while `-XX:CICompilerCount=1` is irrelevant. ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From rcastanedalo at openjdk.java.net Mon Feb 1 11:41:43 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 1 Feb 2021 11:41:43 GMT Subject: RFR: 8260581: IGV: enhance node search [v2] In-Reply-To: References: <1OvXkUZq1-wp42Ik4uO1Po77am92EVM99S9JWB4mf4I=.32748364-a89f-4002-b7a7-378ca2623422@github.com> Message-ID: On Fri, 29 Jan 2021 15:21:27 GMT, Christian Hagedorn wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove debug print > > Otherwise, it looks good to me. > > I applied your patch and tested the new features in the IGV. They work as expected. Good to see the search being improved! Commit 981e903 adds a match sorting refinement contributed by @chhagedorn (numeric sorting of same-rank matches), please re-review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2285 From github.com+25214855+casparcwang at openjdk.java.net Mon Feb 1 12:10:45 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Mon, 1 Feb 2021 12:10:45 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> Message-ID: <_Wm-fi9j4TZ41F0G_92f7ioKQeDNgZiOEMmLkZ0lvvE=.0a9beba5-5089-4368-b4bc-73faf9d5e858@github.com> On Mon, 1 Feb 2021 11:35:13 GMT, Vladimir Ivanov wrote: > > compileonly and compilercount=1 will let the VM run slow enough to wait for a gc to be finished. > > That's a strange way to provoke the bug. You could just increase the number of iterations instead. > > But the right way to fix it is to stress ZGC to continuously run in the background while the test case aggressively unboxes vectors in compiled code. `-Xmx256m` helps with that while `-XX:CICompilerCount=1` is irrelevant. Yes, it's very weird to provoke the bug like this. If CICompilerCount=1 is removed, the test failed 60% roughly on my machine. And the iteration has already changed from 100 to 1000, the run time of the test is nearly 30s on release version of jvm. If I add the following patch, the test always fails on my machine, diff --git a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java index 1843ec0..959b29a 100644 --- a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java +++ b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java @@ -44,7 +44,7 @@ import jdk.internal.vm.annotation.ForceInline; * @modules jdk.incubator.vector * @modules java.base/jdk.internal.vm.annotation * @run testng/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer - * -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseZGC -Xbatch -Xmx256m VectorRebracket128Test + * -XX:-TieredCompilation -XX:+UseZGC -Xmx256m VectorRebracket128Test */ @Test @@ -125,6 +125,14 @@ public class VectorRebracket128Test { @ForceInline static void testVectorRebracket(VectorSpecies a, VectorSpecies b, byte[] input, byte[] output) { + new Thread(() -> { + while (true) { + try { + System.gc(); + Thread.sleep(100); + } catch (Exception e) {} + } + }).start(); Vector av = a.fromByteArray(input, 0, ByteOrder.nativeOrder()); int block; assert(input.length == output.length); ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From github.com+25214855+casparcwang at openjdk.java.net Mon Feb 1 12:18:44 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Mon, 1 Feb 2021 12:18:44 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: <_Wm-fi9j4TZ41F0G_92f7ioKQeDNgZiOEMmLkZ0lvvE=.0a9beba5-5089-4368-b4bc-73faf9d5e858@github.com> References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> <_Wm-fi9j4TZ41F0G_92f7ioKQeDNgZiOEMmLkZ0lvvE=.0a9beba5-5089-4368-b4bc-73faf9d5e858@github.com> Message-ID: On Mon, 1 Feb 2021 12:06:26 GMT, ?? wrote: >>> compileonly and compilercount=1 will let the VM run slow enough to wait for a gc to be finished. >> >> That's a strange way to provoke the bug. You could just increase the number of iterations instead. >> >> But the right way to fix it is to stress ZGC to continuously run in the background while the test case aggressively unboxes vectors in compiled code. `-Xmx256m` helps with that while `-XX:CICompilerCount=1` is irrelevant. > >> > compileonly and compilercount=1 will let the VM run slow enough to wait for a gc to be finished. >> >> That's a strange way to provoke the bug. You could just increase the number of iterations instead. >> >> But the right way to fix it is to stress ZGC to continuously run in the background while the test case aggressively unboxes vectors in compiled code. `-Xmx256m` helps with that while `-XX:CICompilerCount=1` is irrelevant. > > Yes, it's very weird to provoke the bug like this. If CICompilerCount=1 is removed, the test failed 60% roughly on my machine. > And the iteration has already changed from 100 to 1000, the run time of the test is nearly 30s on release version of jvm. > > If I add the following patch, the test always fails on my machine, > > diff --git a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java > index 1843ec0..959b29a 100644 > --- a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java > +++ b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java > @@ -44,7 +44,7 @@ import jdk.internal.vm.annotation.ForceInline; > * @modules jdk.incubator.vector > * @modules java.base/jdk.internal.vm.annotation > * @run testng/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer > - * -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseZGC -Xbatch -Xmx256m VectorRebracket128Test > + * -XX:-TieredCompilation -XX:+UseZGC -Xmx256m VectorRebracket128Test > */ > > @Test > @@ -125,6 +125,14 @@ public class VectorRebracket128Test { > @ForceInline > static > void testVectorRebracket(VectorSpecies a, VectorSpecies b, byte[] input, byte[] output) { > + new Thread(() -> { > + while (true) { > + try { > + System.gc(); > + Thread.sleep(100); > + } catch (Exception e) {} > + } > + }).start(); > Vector av = a.fromByteArray(input, 0, ByteOrder.nativeOrder()); > int block; > assert(input.length == output.length); sorry for the wrong patch above, the failed reason of the patch above is due to stack creation failure (create 1000 threads). The following is the right stress gc patch. diff --git a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java index 6b266db..a761ea2 100644 --- a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java +++ b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java @@ -44,7 +44,7 @@ import jdk.internal.vm.annotation.ForceInline; * @modules jdk.incubator.vector * @modules java.base/jdk.internal.vm.annotation * @run testng/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer - * -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseZGC -Xbatch -Xmx256m VectorRebracket128Test + * -XX:-TieredCompilation -XX:+UseZGC -Xmx256m VectorRebracket128Test */ @Test @@ -59,6 +59,19 @@ public class VectorRebracket128Test { static final VectorSpecies bspec128 = ByteVector.SPECIES_128; static final VectorSpecies sspec128 = ShortVector.SPECIES_128; + static { + Thread t = new Thread(() -> { + while (true) { + try { + System.gc(); + Thread.sleep(100); + } catch (Exception e) {} + } + }); + t.setDaemon(true); + t.start(); + } + static IntFunction withToString(String s, IntFunction f) { return new IntFunction() { @Override ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From aph at redhat.com Mon Feb 1 12:32:00 2021 From: aph at redhat.com (Andrew Haley) Date: Mon, 1 Feb 2021 12:32:00 +0000 Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: References: <6a306bd0-19a9-f8d2-a285-1239371f389e@redhat.com> Message-ID: On 2/1/21 8:11 AM, dongbo (E) wrote: > The tests passed with the newest version and still can catch the typo. > Also tested a few of injected errors, the tests failed as expected. One oddity has come up. I'm running compiler/c2/TestShiftRightAndAccumulate and the generated code I see for compiler.c2.TestShiftRightAndAccumulate::test_shorts looks like: ;; B313: # out( B313 B314 ) <- in( B312 B313 ) Loop( B313-B313 inner main of N1776 strip mined) Freq: 1006.03 0x0000ffff70394cb0: sbfiz x16, x13, #1, #32 ;*saload {reexecute=0 rethrow=0 return_oop=0} ; - compiler.c2.TestShiftRightAndAccumulate::test_shorts at 268 (line 156) 0x0000ffff70394cb4: add xmethod, xbcp, x16 ;*saload {reexecute=0 rethrow=0 return_oop=0} ; - jdk.internal.util.ArraysSupport::mismatch at 9 (line 362) ; - java.util.Arrays::equals at 31 (line 2518) ; - compiler.c2.TestShiftRightAndAccumulate::test_shorts at 286 (line 158) 0x0000ffff70394cb8: add x14, xdispatch, x16 ;*iconst_m1 {reexecute=0 rethrow=0 return_oop=0} ; - jdk.internal.util.ArraysSupport::mismatch at 70 (line 376) ; - java.util.Arrays::equals at 31 (line 2518) ; - compiler.c2.TestShiftRightAndAccumulate::test_shorts at 213 (line 152) 0x0000ffff70394cbc: ldrsh w11, [xmethod,#16] 0x0000ffff70394cc0: ldrsh w15, [x14,#16] 0x0000ffff70394cc4: add w11, w15, w11, lsr #23 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - jdk.internal.util.ArraysSupport::mismatch at 50 (line 372) ; - java.util.Arrays::equals at 31 (line 2518) ; - compiler.c2.TestShiftRightAndAccumulate::test_shorts at 213 (line 152) 0x0000ffff70394cc8: add x16, xesp, x16 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - compiler.c2.TestShiftRightAndAccumulate::test_shorts at 226 (line 154) 0x0000ffff70394ccc: strh w11, [x16,#16] ;*aaload {reexecute=0 rethrow=0 return_oop=0} ; - compiler.c2.TestShiftRightAndAccumulate::test_shorts at 212 (line 152) 0x0000ffff70394cd0: ldrsh w15, [xmethod,#16] 0x0000ffff70394cd4: ldrsh w17, [x14,#16] 0x0000ffff70394cd8: add x11, x17, w11, sxth 0x0000ffff70394cdc: add w11, w11, w15 0x0000ffff70394ce0: strh w11, [x16,#16] ;*ifeq {reexecute=0 rethrow=0 return_oop=0} ; - jdk.internal.util.ArraysSupport::vectorizedMismatch at 62 (line 128) ; - jdk.internal.util.ArraysSupport::mismatch at 32 (line 364) ; - java.util.Arrays::equals at 31 (line 2518) ; - compiler.c2.TestShiftRightAndAccumulate::test_shorts at 140 (line 146) 0x0000ffff70394ce4: ldrsh w11, [xmethod,#18] No vector instructions here. As far as I can see vectors are never used for jshort, just for jchar. All very strange, and probably not your fault, but since I'm looking I had to mention it. The other weird thing is that {u,s}sra is never generated with the .8B form. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From chagedorn at openjdk.java.net Mon Feb 1 12:41:43 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 1 Feb 2021 12:41:43 GMT Subject: RFR: 8260581: IGV: enhance node search [v4] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 11:34:59 GMT, Roberto Casta?eda Lozano wrote: >> Apply several enhancements to the quick node search functionality: >> >> - Allow users to search by node id or name by default (i.e. when no property is specified) instead of name only. >> - Show partial matches when searching for a specific property (e.g. so that searching "type=con" lists all "control"-type nodes). >> - Avoid showing the "All _N_ matching nodes" entry if there is a single match, or the user is searching a numeric value. >> - Rank matches so that full matches are listed first, followed by matches at the beginning of the partially matched value, followed by the rest of matches in increasing size of the partially matched value. Numeric matches with the same rank are sorted increasingly. For example, searching "5" on a set of nodes with labels {"5 AddI", "25 AddL", "253 AddL", "554 MulI"} should list the matches as follows: >> 1. **5** AddI >> 2. **5**54 MulI >> 3. 2**5** AddL >> 4. 2**5**3 AddL >> >> As an illustration of some of these enhancements, this screenshot shows the behavior of the quick node search functionality before the changes: >> >> ![search-before](https://user-images.githubusercontent.com/8792647/106283438-374ba500-6242-11eb-8ef4-d18117eabcbb.png) >> >> and after: >> >> ![search-after](https://user-images.githubusercontent.com/8792647/106282880-7e856600-6241-11eb-8cb5-48fae5582cc2.png) >> >> >> Tested manually on small and large (~10000 nodes) graphs. Thanks to Christian Hagedorn for feedback on several iterations of the enhancements. >> >> As part of the review, please evaluate not just the code changes but also the usability. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Sort same-rank matches by first-word numeric value > > Sort otherwise equally relevant matches by node id, which is by default the > first word in node labels. Thanks to Christian Hagedorn for the suggestion and > (slightly adapted) patch. Thanks for adding the additional sorting suggestion! Looks good to me. I applied the patch again and everything works as expected in the IGV. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2285 From vlivanov at openjdk.java.net Mon Feb 1 12:47:46 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 1 Feb 2021 12:47:46 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> <_Wm-fi9j4TZ41F0G_92f7ioKQeDNgZiOEMmLkZ0lvvE=.0a9beba5-5089-4368-b4bc-73faf9d5e858@github.com> Message-ID: <226iFOsl1hXrEoSe9uzgBb1Z75wxQEv5azlJIfzCO4k=.69d5ed3a-7337-472d-b106-1ce2e5d361bf@github.com> On Mon, 1 Feb 2021 12:15:38 GMT, ?? wrote: >>> > compileonly and compilercount=1 will let the VM run slow enough to wait for a gc to be finished. >>> >>> That's a strange way to provoke the bug. You could just increase the number of iterations instead. >>> >>> But the right way to fix it is to stress ZGC to continuously run in the background while the test case aggressively unboxes vectors in compiled code. `-Xmx256m` helps with that while `-XX:CICompilerCount=1` is irrelevant. >> >> Yes, it's very weird to provoke the bug like this. If CICompilerCount=1 is removed, the test failed 60% roughly on my machine. >> And the iteration has already changed from 100 to 1000, the run time of the test is nearly 30s on release version of jvm. >> >> If I add the following patch, the test always fails on my machine, >> >> diff --git a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java >> index 1843ec0..959b29a 100644 >> --- a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java >> +++ b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java >> @@ -44,7 +44,7 @@ import jdk.internal.vm.annotation.ForceInline; >> * @modules jdk.incubator.vector >> * @modules java.base/jdk.internal.vm.annotation >> * @run testng/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer >> - * -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseZGC -Xbatch -Xmx256m VectorRebracket128Test >> + * -XX:-TieredCompilation -XX:+UseZGC -Xmx256m VectorRebracket128Test >> */ >> >> @Test >> @@ -125,6 +125,14 @@ public class VectorRebracket128Test { >> @ForceInline >> static >> void testVectorRebracket(VectorSpecies a, VectorSpecies b, byte[] input, byte[] output) { >> + new Thread(() -> { >> + while (true) { >> + try { >> + System.gc(); >> + Thread.sleep(100); >> + } catch (Exception e) {} >> + } >> + }).start(); >> Vector av = a.fromByteArray(input, 0, ByteOrder.nativeOrder()); >> int block; >> assert(input.length == output.length); > > sorry for the wrong patch above, the failed reason of the patch above is due to stack creation failure (create 1000 threads). The following is the right stress gc patch. > > diff --git a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java > index 6b266db..a761ea2 100644 > --- a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java > +++ b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java > @@ -44,7 +44,7 @@ import jdk.internal.vm.annotation.ForceInline; > * @modules jdk.incubator.vector > * @modules java.base/jdk.internal.vm.annotation > * @run testng/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer > - * -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseZGC -Xbatch -Xmx256m VectorRebracket128Test > + * -XX:-TieredCompilation -XX:+UseZGC -Xmx256m VectorRebracket128Test > */ > > @Test > @@ -59,6 +59,19 @@ public class VectorRebracket128Test { > static final VectorSpecies bspec128 = ByteVector.SPECIES_128; > static final VectorSpecies sspec128 = ShortVector.SPECIES_128; > > + static { > + Thread t = new Thread(() -> { > + while (true) { > + try { > + System.gc(); > + Thread.sleep(100); > + } catch (Exception e) {} > + } > + }); > + t.setDaemon(true); > + t.start(); > + } > + > static IntFunction withToString(String s, IntFunction f) { > return new IntFunction() { > @Override Good. Please, file a follow-up RFE to improve the test. ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From dongbo4 at huawei.com Mon Feb 1 14:35:09 2021 From: dongbo4 at huawei.com (dongbo (E)) Date: Mon, 1 Feb 2021 22:35:09 +0800 Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: References: <6a306bd0-19a9-f8d2-a285-1239371f389e@redhat.com> Message-ID: <324d99d5-3fdb-084c-1ab8-4a2282aaa447@huawei.com> On 2021/2/1 20:32, Andrew Haley wrote: > On 2/1/21 8:11 AM, dongbo (E) wrote: >> The tests passed with the newest version and still can catch the typo. >> Also tested a few of injected errors, the tests failed as expected. > One oddity has come up. > > I'm running compiler/c2/TestShiftRightAndAccumulate > and the generated code I see for compiler.c2.TestShiftRightAndAccumulate::test_shorts > > No vector instructions here. As far as I can see vectors are never used > for jshort, just for jchar. All very strange, and probably not your fault, > but since I'm looking I had to mention it. > > The other weird thing is that {u,s}sra is never generated with the .8B form. I guess the `usra` is not generated for `byte` and `short` because: ``` src/hotspot/share/opto/vectornode.cpp, line 182: ? case Op_URShiftI: ??? switch (bt) { ??? case T_BOOLEAN:return Op_URShiftVB; ??? case T_CHAR:?? return Op_URShiftVS; ??? case T_BYTE: ??? case T_SHORT:? return 0; // Vector logical right shift for signed short ???????????????????????????? // values produces incorrect Java result for ???????????????????????????? // negative data because java code should convert ???????????????????????????? // a short value into int value with sign ???????????????????????????? // extension before a shift. ??? case T_INT:??? return Op_URShiftVI; ??? default:?????? ShouldNotReachHere(); return 0; ??? } ``` For `byte` and `short`, we can have `ssra` for the code below: ``` ??????? for (int i = 0; i < count; i++) { ??????????? shortsC[i] = (short) (shortsA[i] + (shortsB[i] >> 5)); ??????????? shortsD[i] = (short) (shortsA[i] + shortsB[i]); ??????? } ``` I updated the tests to use this code, although I don't know why the similar pattern of `char` does not works for `short`. For `.8B` form, we only have `sshr + add`: ``` ????????? ??? 0x0000ffff90085664:?? sshr??????? v16.8b, v16.8b, #1 ????????? ??? 0x0000ffff90085668:?? add v16.8b, v16.8b, v17.8b ``` According to the current implementation, it is strange that they are not combined into a `ssra`: ``` AddVB:??? match(Set dst (AddVB src1 src2)) RShiftVB: match(Set dst (RShiftVB src (RShiftCntV shift))) SSRA_VB: match(Set dst (AddVB dst (RShiftVB src (RShiftCntV shift)))) ``` I think we need further investigate the last two issues. From dongbo at openjdk.java.net Mon Feb 1 14:36:04 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 1 Feb 2021 14:36:04 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v5] In-Reply-To: References: Message-ID: <3CKNS6J7Rr4dq69Vw5-PofaIvMgZulMUZCYk1N9Hy9E=.e6dc1c3f-080c-4d30-9f94-2f149613d9a0@github.com> > This is a typo introduced by JDK-8255949. > Compiler will generate `ushr` for shifting right and accumulating four short integers. > It produces wrong results for specific case. The instruction should be `usra`. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: update tests for bytes and shorts ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/136/files - new: https://git.openjdk.java.net/jdk16/pull/136/files/f2e490a3..ca3d2192 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=03-04 Stats: 74 lines in 1 file changed: 19 ins; 0 del; 55 mod Patch: https://git.openjdk.java.net/jdk16/pull/136.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/136/head:pull/136 PR: https://git.openjdk.java.net/jdk16/pull/136 From kvn at openjdk.java.net Mon Feb 1 15:51:46 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 1 Feb 2021 15:51:46 GMT Subject: RFR: 8259398: Super word not applied to a loop with byteArrayViewVarHandle In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 10:37:27 GMT, Christian Hagedorn wrote: >> Address expressing in this case has CastII which is not range check related. >> I think it is safe to skip any CastII nodes (similar to ConvI2L nodes) when parsing address for vectors - vectors will be constructed only if the same loop's variable and invariant are used for all memory operations regardless casts. Also vectors address depends on loop's variable so they will not be moved outside loop. >> In 32-bit VM there is no ConvI2L nodes so I moved CastII checks from under ConvI2L check. >> >> New regression case added to TestBufferVectorization.java test. >> >> Testing hs-tier1-7, RenaissanceStressTest. Run all vectorizing tests locally to make sure no regression. > > Looks good! Thank you all ------------- PR: https://git.openjdk.java.net/jdk/pull/2317 From kvn at openjdk.java.net Mon Feb 1 15:51:46 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 1 Feb 2021 15:51:46 GMT Subject: RFR: 8259398: Super word not applied to a loop with byteArrayViewVarHandle In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 15:47:17 GMT, Vladimir Kozlov wrote: >> Looks good! > > Thank you all Thank you all for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2317 From kvn at openjdk.java.net Mon Feb 1 15:51:47 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 1 Feb 2021 15:51:47 GMT Subject: Integrated: 8259398: Super word not applied to a loop with byteArrayViewVarHandle In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 17:40:39 GMT, Vladimir Kozlov wrote: > Address expressing in this case has CastII which is not range check related. > I think it is safe to skip any CastII nodes (similar to ConvI2L nodes) when parsing address for vectors - vectors will be constructed only if the same loop's variable and invariant are used for all memory operations regardless casts. Also vectors address depends on loop's variable so they will not be moved outside loop. > In 32-bit VM there is no ConvI2L nodes so I moved CastII checks from under ConvI2L check. > > New regression case added to TestBufferVectorization.java test. > > Testing hs-tier1-7, RenaissanceStressTest. Run all vectorizing tests locally to make sure no regression. This pull request has now been integrated. Changeset: 02d586e1 Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/02d586e1 Stats: 57 lines in 2 files changed: 41 ins; 6 del; 10 mod 8259398: Super word not applied to a loop with byteArrayViewVarHandle Reviewed-by: vlivanov, thartmann, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/2317 From github.com+7837910+xpbob at openjdk.java.net Mon Feb 1 18:46:53 2021 From: github.com+7837910+xpbob at openjdk.java.net (xpbob) Date: Mon, 1 Feb 2021 18:46:53 GMT Subject: RFR: 8260576: Typo in compiler/runtime/safepoints/TestRegisterRestoring.java Message-ID: 8260576: Typo in compiler/runtime/safepoints/TestRegisterRestoring.java ------------- Commit messages: - 8260576: Typo in compiler/runtime/safepoints/TestRegisterRestoring.java Changes: https://git.openjdk.java.net/jdk/pull/2286/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2286&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260576 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2286.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2286/head:pull/2286 PR: https://git.openjdk.java.net/jdk/pull/2286 From thartmann at openjdk.java.net Mon Feb 1 18:46:53 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 1 Feb 2021 18:46:53 GMT Subject: RFR: 8260576: Typo in compiler/runtime/safepoints/TestRegisterRestoring.java In-Reply-To: References: Message-ID: <9jFka1iZIXLM0WVXVx2pH7XT9X9QNlYvI8hj5QyrDuY=.cb0b079b-e089-4360-b765-78b087a0ee4d@github.com> On Thu, 28 Jan 2021 11:23:22 GMT, xpbob wrote: > 8260576: Typo in compiler/runtime/safepoints/TestRegisterRestoring.java Changes requested by thartmann (Reviewer). Marked as reviewed by thartmann (Reviewer). test/hotspot/jtreg/compiler/runtime/safepoints/TestRegisterRestoring.java line 50: > 48: for (int i = 0; i < array.length; i++) { > 49: if (array[i] != 10_000) { > 50: throw new RuntimeException("Test failed: array[" + i + "] = " + array[i] + " but should be 10,000"); `.` is simply the (German) thousands separator. Not sure it's really worth it changing it to a `,`. But given the exception message is in English, looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/2286 From github.com+51754783+coreyashford at openjdk.java.net Mon Feb 1 18:51:47 2021 From: github.com+51754783+coreyashford at openjdk.java.net (Corey Ashford) Date: Mon, 1 Feb 2021 18:51:47 GMT Subject: RFR: 8259822: [PPC64] Support the prefixed instruction format added in POWER10 [v3] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 08:28:06 GMT, Kazunori Ogata wrote: >> The POWER10 processor, which implements Power ISA 3.1 [1], supports new instruction formats where an instruction takes two 32bit words. The first word is called prefix, and the instructions with prefix are called prefixed instructions. With more bits in opcode and operand fields, POWER10 supports larger immediate value in an operand, as well as many new instructions. >> >> This is the first changes to handle prefixed instructions, and this adds support of prefixed addi (= paddi) instruction as an example of prefix usage. paddi accepts 34bit immediate value, while original addi accepts 16bit value. >> >> [1] https://ibm.ent.box.com/s/hhjfw0x0lrbtyzmiaffnbxh2fuo0fog0 > > Kazunori Ogata has updated the pull request incrementally with one additional commit since the last revision: > > Update (2nd round) based on review comments Just a few more minor things... several defined-but-not-used functions, and some little formatting issues. src/hotspot/cpu/ppc/assembler_ppc.inline.hpp line 43: > 41: // Add nop if a prefixed (two-word) instruction is going to cross a 64-byte boundary. > 42: // (See Section 1.6 of Power ISA Version 3.1) > 43: if(is_aligned(reinterpret_cast(pc()) + sizeof(int32_t), 64) || add space after 'if' src/hotspot/cpu/ppc/assembler_ppc.inline.hpp line 149: > 147: inline void Assembler::paddi( Register d, Register a, long si34, bool r = false) { > 148: assert(a != R0 || r, "r0 not allowed, unless R is set (CIA relative)"); > 149: paddi_r0ok( d, a, si34, r); The space after the ( isn't needed here, since it's not aligning with a similar call above or below. src/hotspot/cpu/ppc/assembler_ppc.hpp line 1322: > 1320: > 1321: static inline int hi18_signed( int x) { return hi16_signed(x); } > 1322: static inline int hi18_signed( long x) { return (int)((x << 30) >> 46); } The extra spaces for alignment look a bit strange here. I think it should be: static inline int hi18_signed( int x) ... static inline int hi18_signed(long x) ... Basically, remove the leading space from the ( long x) one and delete one from ( int x) to match. src/hotspot/cpu/ppc/assembler_ppc.hpp line 1389: > 1387: inline void pla( Register d, long si34); > 1388: inline void pla( Register d, Register a, long si34); > 1389: inline void psubi(Register d, Register a, long si34); pla and psubi are defined but not used. ------------- Changes requested by CoreyAshford at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/2095 From github.com+51754783+coreyashford at openjdk.java.net Mon Feb 1 18:51:47 2021 From: github.com+51754783+coreyashford at openjdk.java.net (Corey Ashford) Date: Mon, 1 Feb 2021 18:51:47 GMT Subject: RFR: 8259822: [PPC64] Support the prefixed instruction format added in POWER10 [v3] In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 17:50:57 GMT, Corey Ashford wrote: >> "is_()" are the functions for testing instructions in the generated code and they are usually called in "assert()" or complex combination of if statements. So if this function returns false when p points at nop, the assertions or the if-conditions need to handle the nop case and the code will become difficult to read. Since the padding nop is a non-common case, I'd like to hide the existence of nop in this function. > > Ok, got it. Thanks for the explanation. Maybe an extra comment in the code saying essentially what you said here would be appropriate? Looks good! >> If predicate is added, adlc fails with an error message: "Syntax Error: :ADLC does not support instruction chain rules with predicates" I think addL_reg_imm34 allows predicate because it is not called from other rules. Is it better to leave some comments? (BTW, immI32 is only for POWER10 or higher. POWER9 version uses immI16 or immI16hi.) > > Hmm, I'm confused. I don't see any other reference to loadConL34 in ppc.ad. I wish we knew why this predicate is causing an issue, but I guess it's not important because the operand type provides sufficient limiting of the instruct. ------------- PR: https://git.openjdk.java.net/jdk/pull/2095 From jiefu at openjdk.java.net Tue Feb 2 00:04:41 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 2 Feb 2021 00:04:41 GMT Subject: RFR: 8260576: Typo in compiler/runtime/safepoints/TestRegisterRestoring.java In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 11:23:22 GMT, xpbob wrote: > 8260576: Typo in compiler/runtime/safepoints/TestRegisterRestoring.java Looks good and trivial. Will sponsor it. ------------- Marked as reviewed by jiefu (Committer). PR: https://git.openjdk.java.net/jdk/pull/2286 From xliu at openjdk.java.net Tue Feb 2 00:57:39 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 2 Feb 2021 00:57:39 GMT Subject: RFR: 8260581: IGV: enhance node search [v4] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 11:34:59 GMT, Roberto Casta?eda Lozano wrote: >> Apply several enhancements to the quick node search functionality: >> >> - Allow users to search by node id or name by default (i.e. when no property is specified) instead of name only. >> - Show partial matches when searching for a specific property (e.g. so that searching "type=con" lists all "control"-type nodes). >> - Avoid showing the "All _N_ matching nodes" entry if there is a single match, or the user is searching a numeric value. >> - Rank matches so that full matches are listed first, followed by matches at the beginning of the partially matched value, followed by the rest of matches in increasing size of the partially matched value. Numeric matches with the same rank are sorted increasingly. For example, searching "5" on a set of nodes with labels {"5 AddI", "25 AddL", "253 AddL", "554 MulI"} should list the matches as follows: >> 1. **5** AddI >> 2. **5**54 MulI >> 3. 2**5** AddL >> 4. 2**5**3 AddL >> >> As an illustration of some of these enhancements, this screenshot shows the behavior of the quick node search functionality before the changes: >> >> ![search-before](https://user-images.githubusercontent.com/8792647/106283438-374ba500-6242-11eb-8ef4-d18117eabcbb.png) >> >> and after: >> >> ![search-after](https://user-images.githubusercontent.com/8792647/106282880-7e856600-6241-11eb-8cb5-48fae5582cc2.png) >> >> >> Tested manually on small and large (~10000 nodes) graphs. Thanks to Christian Hagedorn for feedback on several iterations of the enhancements. >> >> As part of the review, please evaluate not just the code changes but also the usability. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Sort same-rank matches by first-word numeric value > > Sort otherwise equally relevant matches by node id, which is by default the > first word in node labels. Thanks to Christian Hagedorn for the suggestion and > (slightly adapted) patch. LGTM. I ran it with your patch. it works as expected. ------------- Marked as reviewed by xliu (no project role). PR: https://git.openjdk.java.net/jdk/pull/2285 From jiefu at openjdk.java.net Tue Feb 2 02:01:48 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 2 Feb 2021 02:01:48 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: <226iFOsl1hXrEoSe9uzgBb1Z75wxQEv5azlJIfzCO4k=.69d5ed3a-7337-472d-b106-1ce2e5d361bf@github.com> References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> <_Wm-fi9j4TZ41F0G_92f7ioKQeDNgZiOEMmLkZ0lvvE=.0a9beba5-5089-4368-b4bc-73faf9d5e858@github.com> <226iFOsl1hXrEoSe9uzgBb1Z75wxQEv5azlJIfzCO4k=.69d5ed3a-7337-472d-b106-1ce2e5d361bf@github.com> Message-ID: On Mon, 1 Feb 2021 12:44:59 GMT, Vladimir Ivanov wrote: > Good. Please, file a follow-up RFE to improve the test. OK. I will help to file a JBS bug once the fix has been merged into the jdk mainline. It will be only fixed in the jdk17, right? Thanks. ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From github.com+7837910+xpbob at openjdk.java.net Tue Feb 2 02:19:42 2021 From: github.com+7837910+xpbob at openjdk.java.net (xpbob) Date: Tue, 2 Feb 2021 02:19:42 GMT Subject: Integrated: 8260576: Typo in compiler/runtime/safepoints/TestRegisterRestoring.java In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 11:23:22 GMT, xpbob wrote: > 8260576: Typo in compiler/runtime/safepoints/TestRegisterRestoring.java This pull request has now been integrated. Changeset: 54e7a642 Author: bobpengxie Committer: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/54e7a642 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8260576: Typo in compiler/runtime/safepoints/TestRegisterRestoring.java Reviewed-by: thartmann, jiefu ------------- PR: https://git.openjdk.java.net/jdk/pull/2286 From dongbo at openjdk.java.net Tue Feb 2 07:08:11 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 2 Feb 2021 07:08:11 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v6] In-Reply-To: References: Message-ID: > This is a typo introduced by JDK-8255949. > Compiler will generate `ushr` for shifting right and accumulating four short integers. > It produces wrong results for specific case. The instruction should be `usra`. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: Update tests to match .2I. Still cannot match ssra for .8B, sshr+add are not combined. ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/136/files - new: https://git.openjdk.java.net/jdk16/pull/136/files/ca3d2192..693f8cbd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=04-05 Stats: 88 lines in 1 file changed: 39 ins; 41 del; 8 mod Patch: https://git.openjdk.java.net/jdk16/pull/136.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/136/head:pull/136 PR: https://git.openjdk.java.net/jdk16/pull/136 From thartmann at openjdk.java.net Tue Feb 2 07:26:48 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 2 Feb 2021 07:26:48 GMT Subject: Integrated: 8260420: C2 compilation fails with assert(found_sfpt) failed: no node in loop that's not input to safepoint In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 14:35:35 GMT, Tobias Hartmann wrote: > Loop strip mining verification fails because a `LoadNode` with no safepoint use ends up in the `OuterStripMinedLoop`. The root cause is very similar to [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607): A `LoadNode` with two uses (a field store and a return, see `test2`) is cloned by `PhaseIdealLoop::split_if_with_blocks_post()` for each use, to allow it to flow out of the loop. Both clones end up in the `OuterStripMinedLoop`, the clone for the field store because the store has a safepoint use and the clone for the return because `late_load_ctrl` is too conservative by taking all initial uses into account. > > Now the load without a safepoint use is always detected by loop strip mining verification but without verification (for example, in a product build), we still hit several different issues depending on the exact use of the load (compare `test2/3/4`). The main issue is that loads without a safepoint use are not correctly wired when creating pre/main/post loops. Christian described this well in his RFR for [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039638.html > > Therefore, relaxing loop strip mining verification is not an option. > > The problem with the fix for [JDK-8249607](https://bugs.openjdk.java.net/browse/JDK-8249607) is that it relies on IGVN being executed before pre/main/post loops are created, merging the two clones such that the remaining one has a safepoint use and is therefore correctly handled. However, this is not guaranteed. In fact, if `major_progress` is not set, pre/main/post loops could be created in the same round of loop opts without IGVN in-between (`IdealLoopTree::iteration_split` is executed right after `PhaseIdealLoop::split_if_with_blocks`). > > I think we have the following options: > 1) Bail out if the loop is strip mined. I think that would be too conservative. > > 2) Simply set `major_progress` when pinning a `LoadNode` to the `OuterStripMinedLoop` to make sure IGVN is executed and duplicate `LoadNodes` are merged. That seems quite invasive though. > > 3) Allow the cloned load without a safepoint use (and therefore no usages in the `OuterStripMinedLoop`) to completely flow out of the loop. Currently, this is blocked by `late_load_ctrl` being computed on the initial `LoadNode` instead of the clone and therefore also taking into account the store that is referenced by the safepoint. We could simply re-compute the late ctrl for the cloned load, allowing it to flow out of the `OuterStripMinedLoop`. However, that only works if there is no anti-dependency in the `OuterStripMinedLoop`. > > 4) Detect duplicate loads in the `OuterStripMinedLoop` on creation in `PhaseIdealLoop::split_if_with_blocks` and merge them right away. > > I've decided to go with 4) and also reverted the fix for JDK-8249607 which is no longer required. > > As Roland and Christian already noticed, there are various other issues with that code that hopefully will be addressed at some point by [JDK-8252372](https://bugs.openjdk.java.net/browse/JDK-8252372). > > Thanks, > Tobias This pull request has now been integrated. Changeset: fe407cf1 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/fe407cf1 Stats: 91 lines in 2 files changed: 75 ins; 1 del; 15 mod 8260420: C2 compilation fails with assert(found_sfpt) failed: no node in loop that's not input to safepoint Reviewed-by: kvn, roland, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/2315 From rcastanedalo at openjdk.java.net Tue Feb 2 08:14:54 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 2 Feb 2021 08:14:54 GMT Subject: RFR: 8260581: IGV: enhance node search [v4] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 12:39:16 GMT, Christian Hagedorn wrote: > Thanks for adding the additional sorting suggestion! Looks good to me. > > I applied the patch again and everything works as expected in the IGV. Thanks again, Christian! ------------- PR: https://git.openjdk.java.net/jdk/pull/2285 From rcastanedalo at openjdk.java.net Tue Feb 2 08:14:54 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 2 Feb 2021 08:14:54 GMT Subject: RFR: 8260581: IGV: enhance node search [v4] In-Reply-To: References: Message-ID: <3mWXw-F3HmKKfHoRkt1RfIqYPn-G_moCdkx1F-wZ658=.7a5d49fd-33a2-4e85-9530-9b40b2ced589@github.com> On Tue, 2 Feb 2021 00:55:16 GMT, Xin Liu wrote: > LGTM. I ran it with your patch. it works as expected. Thanks for reviewing, Xin! ------------- PR: https://git.openjdk.java.net/jdk/pull/2285 From dongbo at openjdk.java.net Tue Feb 2 08:21:48 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 2 Feb 2021 08:21:48 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 07:48:35 GMT, Ningsheng Jian wrote: >> Dong Bo has updated the pull request incrementally with one additional commit since the last revision: >> >> make empty ins_encode when shift >= 16 (chars) > > Looks good to me. Hi, Andrew. The reason `ssra` is not generated with .8B form is that if loop size is 16, the vector length is not 8 but 4. Because we only have `predicate(n->as_Vector()->length() == 8)` in `vsraa8B_imm`, so they are not matched. We should fix this with the following code: instruct vsraa8B_imm(vecD dst, vecD src, immI shift) %{ - predicate(n->as_Vector()->length() == 8); + predicate(n->as_Vector()->length() == 4 || n->as_Vector()->length() == 8); match(Set dst (AddVB dst (RShiftVB src (RShiftCntV shift)))); ins_cost(INSN_COST); format %{ "ssra $dst, $src, $shift\t# vector (8B)" %} @@ -18782,7 +18782,7 @@ instruct vsraa16B_imm(vecX dst, vecX src, immI shift) %{ %} instruct vsraa4S_imm(vecD dst, vecD src, immI shift) %{ - predicate(n->as_Vector()->length() == 4); + predicate(n->as_Vector()->length() == 2 || n->as_Vector()->length() == 4); match(Set dst (AddVS dst (RShiftVS src (RShiftCntV shift)))); ins_cost(INSN_COST); format %{ "ssra $dst, $src, $shift\t# vector (4H)" %} @@ -18849,7 +18849,7 @@ instruct vsraa2L_imm(vecX dst, vecX src, immI shift) %{ %} instruct vsrla8B_imm(vecD dst, vecD src, immI shift) %{ - predicate(n->as_Vector()->length() == 8); + predicate(n->as_Vector()->length() == 4 || n->as_Vector()->length() == 8); match(Set dst (AddVB dst (URShiftVB src (RShiftCntV shift)))); ins_cost(INSN_COST); format %{ "usra $dst, $src, $shift\t# vector (8B)" %} @@ -18879,7 +18879,7 @@ instruct vsrla16B_imm(vecX dst, vecX src, immI shift) %{ %} instruct vsrla4S_imm(vecD dst, vecD src, immI shift) %{ - predicate(n->as_Vector()->length() == 4); + predicate(n->as_Vector()->length() == 2 || n->as_Vector()->length() == 4); match(Set dst (AddVS dst (URShiftVS src (RShiftCntV shift)))); ins_cost(INSN_COST); format %{ "usra $dst, $src, $shift\t# vector (4H)" %} How do you think if we do this modification together via this PR? ------------- PR: https://git.openjdk.java.net/jdk16/pull/136 From shade at openjdk.java.net Tue Feb 2 09:29:53 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 2 Feb 2021 09:29:53 GMT Subject: RFR: 8260899: ARM32: SyncOnValueBasedClassTest fails with assert(is_valid()) failed: invalid register Message-ID: <3kcU-WrscBpRvHSepNBZWu4CNsBaP6pdeYTyIUT2z1E=.180e2e1d-7dec-4c28-b9c4-058627b12419@github.com> $ CONF=linux-arm-server-fastdebug make run-test TEST=runtime/Monitor/SyncOnValueBasedClassTest.java ... # Internal Error (/home/pi/jdk/src/hotspot/cpu/arm/register_arm.hpp:155), pid=3793, tid=3808 # assert(is_valid()) failed: invalid register # # JRE version: OpenJDK Runtime Environment (17.0) (fastdebug build 17-internal+0-adhoc.pi.jdk) # Java VM: OpenJDK Server VM (fastdebug 17-internal+0-adhoc.pi.jdk, compiled mode, emulated-client, g1 gc, linux-arm) # Problematic frame: # V [libjvm.so+0xe1a6a8] MacroAssembler::load_klass(RegisterImpl*, RegisterImpl*, AsmCondition)+0xa4 Current CompileTask: C1: 318 2 !b java.lang.Class::desiredAssertionStatus (54 bytes) Stack: [0x72580000,0x72600000], sp=0x725fe170, free space=504k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xe1a6a8] MacroAssembler::load_klass(RegisterImpl*, RegisterImpl*, AsmCondition)+0xa4 V [libjvm.so+0x43b6b4] C1_MacroAssembler::lock_object(RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, Label&)+0xcf8 V [libjvm.so+0x3d731c] LIR_Assembler::emit_lock(LIR_OpLock*)+0x160 The problem is in this code: if (DiagnoseSyncOnValueBasedClasses != 0) { load_klass(tmp1, obj); <--- asserts ldr_u32(tmp1, Address(tmp1, Klass::access_flags_offset())); tst(tmp1, JVM_ACC_IS_VALUE_BASED_CLASS); b(slow_case, ne); } `tmp1` is `noreg` when `!BiasedLocking`, because `c1_LIRGenerator_arm.cpp` provides it only when `UseBiasedLocking` is enabled: void LIRGenerator::do_MonitorEnter(MonitorEnter* x) { ... // Need a scratch register for biased locking on arm LIR_Opr scratch = LIR_OprFact::illegalOpr; if(UseBiasedLocking) { scratch = new_pointer_register(); } else { scratch = atomicLockOpr(); // <--- actually illegalOpr } ... monitor_enter(obj.result(), lock, hdr, scratch, x->monitor_no(), info_for_exception, info); } The way out is to use `tmp2`, which is the alias for `Rtemp` and always available. Additional testing: - [x] Linux ARM32 `SyncOnValueBasedClassTest`, `-XX:+UseBiasedLocking` - [x] Linux ARM32 `SyncOnValueBasedClassTest`, `-XX:-UseBiasedLocking` ------------- Commit messages: - 8260899: ARM32: SyncOnValueBasedClassTest fails with assert(is_valid()) failed: invalid register Changes: https://git.openjdk.java.net/jdk/pull/2349/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2349&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260899 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2349.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2349/head:pull/2349 PR: https://git.openjdk.java.net/jdk/pull/2349 From ningsheng.jian at arm.com Tue Feb 2 10:53:34 2021 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Tue, 2 Feb 2021 18:53:34 +0800 Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: <324d99d5-3fdb-084c-1ab8-4a2282aaa447@huawei.com> References: <6a306bd0-19a9-f8d2-a285-1239371f389e@redhat.com> <324d99d5-3fdb-084c-1ab8-4a2282aaa447@huawei.com> Message-ID: Hi, On 2/1/21 10:35 PM, dongbo (E) wrote: > > On 2021/2/1 20:32, Andrew Haley wrote: >> On 2/1/21 8:11 AM, dongbo (E) wrote: >>> The tests passed with the newest version and still can catch the typo. >>> Also tested a few of injected errors, the tests failed as expected. >> One oddity has come up. >> >> I'm running compiler/c2/TestShiftRightAndAccumulate >> and the generated code I see for >> compiler.c2.TestShiftRightAndAccumulate::test_shorts >> >> No vector instructions here. As far as I can see vectors are never used >> for jshort, just for jchar. All very strange, and probably not your >> fault, >> but since I'm looking I had to mention it. >> >> The other weird thing is that {u,s}sra is never generated with the .8B >> form. > > I guess the `usra` is not generated for `byte` and `short` because: > ``` > src/hotspot/share/opto/vectornode.cpp, line 182: > ? case Op_URShiftI: > ??? switch (bt) { > ??? case T_BOOLEAN:return Op_URShiftVB; > ??? case T_CHAR:?? return Op_URShiftVS; > ??? case T_BYTE: > ??? case T_SHORT:? return 0; // Vector logical right shift for signed > short > ???????????????????????????? // values produces incorrect Java result for > ???????????????????????????? // negative data because java code should > convert > ???????????????????????????? // a short value into int value with sign > ???????????????????????????? // extension before a shift. > ??? case T_INT:??? return Op_URShiftVI; > ??? default:?????? ShouldNotReachHere(); return 0; > ??? } > ``` > They can be used in vector api. E.g. var vba = ByteVector.fromArray(ByteVector.SPECIES_PREFERRED, ba, i); vba.add(vba.lanewise(VectorOperators.LSHR, 3)).intoArray(ba, i); Thanks, Ningsheng From thartmann at openjdk.java.net Tue Feb 2 11:44:58 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 2 Feb 2021 11:44:58 GMT Subject: RFR: 8260928: InitArrayShortSize constraint func should print a helpful error message Message-ID: The `InitArrayShortSize` flag requires a value that is a multiple of `BytesPerLong` but no corresponding error message is printed: java -XX:InitArrayShortSize=7 Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Thanks, Tobias ------------- Commit messages: - 8260928: InitArrayShortSize constraint func should print a helpful error message Changes: https://git.openjdk.java.net/jdk/pull/2351/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2351&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260928 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2351.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2351/head:pull/2351 PR: https://git.openjdk.java.net/jdk/pull/2351 From njian at openjdk.java.net Tue Feb 2 11:44:58 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Tue, 2 Feb 2021 11:44:58 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: References: Message-ID: <4KBCYEYrcTMEqEo89p3m4OfqSolqi2agyexH-k1vZrU=.813f7a97-b7ef-4cc0-809b-91e4c5f3c7fd@github.com> On Tue, 2 Feb 2021 08:19:21 GMT, Dong Bo wrote: > Because we only have predicate(n->as_Vector()->length() == 8) in vsraa8B_imm, so they are not matched. > We should fix this with the following code: I think this is an enhancement, and should be done in a separate patch in jdk mainline. ------------- PR: https://git.openjdk.java.net/jdk16/pull/136 From vlivanov at openjdk.java.net Tue Feb 2 11:45:43 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 2 Feb 2021 11:45:43 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> Message-ID: On Sun, 31 Jan 2021 21:29:40 GMT, Nils Eliasson wrote: >> https://bugs.openjdk.java.net/browse/JDK-8260473 >> >> Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. >> >> Testing: all Vector API related tests have passed. >> >> Original pr: https://github.com/openjdk/jdk/pull/2253 > > Approved. > > Now awaiting release team approval. > It will be only fixed in the jdk17, right? Yes, I'm OK with that. ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From shade at openjdk.java.net Tue Feb 2 11:53:45 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 2 Feb 2021 11:53:45 GMT Subject: RFR: 8260928: InitArrayShortSize constraint func should print a helpful error message In-Reply-To: References: Message-ID: <1Jm-oUtc2u-utHK3mkc8rZ0wO8V1A58x_7HN4FsPj90=.a80551e9-6c1d-418a-8537-1eeec8b0c9ce@github.com> On Tue, 2 Feb 2021 11:31:34 GMT, Tobias Hartmann wrote: > The `InitArrayShortSize` flag requires a value that is a multiple of `BytesPerLong` but no corresponding error message is printed: > java -XX:InitArrayShortSize=7 > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. > > Thanks, > Tobias Looks fine to me. Marked as reviewed by shade (Reviewer). src/hotspot/share/runtime/flags/jvmFlagConstraintsCompiler.cpp line 307: > 305: JVMFlag::printError(verbose, > 306: "InitArrayShortSize (" INTX_FORMAT ") must be " > 307: "a multiple of %d\n", value, BytesPerLong); Consider indenting them a bit. I think the line starting from `"a...` has one excess whitespace. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2351 From thartmann at openjdk.java.net Tue Feb 2 12:10:56 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 2 Feb 2021 12:10:56 GMT Subject: RFR: 8260928: InitArrayShortSize constraint func should print a helpful error message [v2] In-Reply-To: References: Message-ID: > The `InitArrayShortSize` flag requires a value that is a multiple of `BytesPerLong` but no corresponding error message is printed: > java -XX:InitArrayShortSize=7 > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Fixed intendation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2351/files - new: https://git.openjdk.java.net/jdk/pull/2351/files/92dd1316..bb87bdd8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2351&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2351&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2351.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2351/head:pull/2351 PR: https://git.openjdk.java.net/jdk/pull/2351 From thartmann at openjdk.java.net Tue Feb 2 12:10:57 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 2 Feb 2021 12:10:57 GMT Subject: RFR: 8260928: InitArrayShortSize constraint func should print a helpful error message [v2] In-Reply-To: <1Jm-oUtc2u-utHK3mkc8rZ0wO8V1A58x_7HN4FsPj90=.a80551e9-6c1d-418a-8537-1eeec8b0c9ce@github.com> References: <1Jm-oUtc2u-utHK3mkc8rZ0wO8V1A58x_7HN4FsPj90=.a80551e9-6c1d-418a-8537-1eeec8b0c9ce@github.com> Message-ID: On Tue, 2 Feb 2021 11:50:50 GMT, Aleksey Shipilev wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed intendation > > src/hotspot/share/runtime/flags/jvmFlagConstraintsCompiler.cpp line 307: > >> 305: JVMFlag::printError(verbose, >> 306: "InitArrayShortSize (" INTX_FORMAT ") must be " >> 307: "a multiple of %d\n", value, BytesPerLong); > > Consider indenting them a bit. I think the line starting from `"a...` has one excess whitespace. Good catch. Fixed. Thanks for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/2351 From ogatak at openjdk.java.net Tue Feb 2 12:45:03 2021 From: ogatak at openjdk.java.net (Kazunori Ogata) Date: Tue, 2 Feb 2021 12:45:03 GMT Subject: RFR: 8259822: [PPC64] Support the prefixed instruction format added in POWER10 [v4] In-Reply-To: References: Message-ID: <2CXH3zvUkBfqLrr8SPpD-x3RcSbFzw0S6NGLWgtirQ8=.0df1a0c8-3ac2-4e21-9e6a-f0b43a3ae20c@github.com> > The POWER10 processor, which implements Power ISA 3.1 [1], supports new instruction formats where an instruction takes two 32bit words. The first word is called prefix, and the instructions with prefix are called prefixed instructions. With more bits in opcode and operand fields, POWER10 supports larger immediate value in an operand, as well as many new instructions. > > This is the first changes to handle prefixed instructions, and this adds support of prefixed addi (= paddi) instruction as an example of prefix usage. paddi accepts 34bit immediate value, while original addi accepts 16bit value. > > [1] https://ibm.ent.box.com/s/hhjfw0x0lrbtyzmiaffnbxh2fuo0fog0 Kazunori Ogata has updated the pull request incrementally with one additional commit since the last revision: Removed pla and psubi and adjusted spacing based on review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2095/files - new: https://git.openjdk.java.net/jdk/pull/2095/files/a8770539..9d606d67 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2095&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2095&range=02-03 Stats: 23 lines in 3 files changed: 2 ins; 6 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/2095.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2095/head:pull/2095 PR: https://git.openjdk.java.net/jdk/pull/2095 From ogatak at openjdk.java.net Tue Feb 2 12:45:04 2021 From: ogatak at openjdk.java.net (Kazunori Ogata) Date: Tue, 2 Feb 2021 12:45:04 GMT Subject: RFR: 8259822: [PPC64] Support the prefixed instruction format added in POWER10 [v3] In-Reply-To: References: Message-ID: <2gvv0ZqsA7SbK0VL5OScVLHNxicDeD153h42QOrR-wk=.46adcc11-2808-4d1f-88f1-3848a8162fbe@github.com> On Mon, 1 Feb 2021 18:49:20 GMT, Corey Ashford wrote: >> Kazunori Ogata has updated the pull request incrementally with one additional commit since the last revision: >> >> Update (2nd round) based on review comments > > Just a few more minor things... several defined-but-not-used functions, and some little formatting issues. @CoreyAshford Thank you for your review. I went through my changes to check spacing, as well as removed unused pla and psubi as you pointed out. ------------- PR: https://git.openjdk.java.net/jdk/pull/2095 From chagedorn at openjdk.java.net Tue Feb 2 13:26:42 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 2 Feb 2021 13:26:42 GMT Subject: RFR: 8260928: InitArrayShortSize constraint func should print a helpful error message [v2] In-Reply-To: References: Message-ID: <19kujjbusieRxoz19CY4GCh0HORyefuHk6uBILYsUcM=.7c0ee83a-9a49-465f-a575-0b3eb010ba6c@github.com> On Tue, 2 Feb 2021 12:10:56 GMT, Tobias Hartmann wrote: >> The `InitArrayShortSize` flag requires a value that is a multiple of `BytesPerLong` but no corresponding error message is printed: >> java -XX:InitArrayShortSize=7 >> Error: Could not create the Java Virtual Machine. >> Error: A fatal exception has occurred. Program will exit. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Fixed intendation Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2351 From thartmann at openjdk.java.net Tue Feb 2 13:39:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 2 Feb 2021 13:39:42 GMT Subject: RFR: 8260928: InitArrayShortSize constraint func should print a helpful error message [v2] In-Reply-To: <19kujjbusieRxoz19CY4GCh0HORyefuHk6uBILYsUcM=.7c0ee83a-9a49-465f-a575-0b3eb010ba6c@github.com> References: <19kujjbusieRxoz19CY4GCh0HORyefuHk6uBILYsUcM=.7c0ee83a-9a49-465f-a575-0b3eb010ba6c@github.com> Message-ID: On Tue, 2 Feb 2021 13:24:21 GMT, Christian Hagedorn wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed intendation > > Looks good. Thanks for the review, Christian! ------------- PR: https://git.openjdk.java.net/jdk/pull/2351 From dongbo4 at huawei.com Tue Feb 2 14:05:34 2021 From: dongbo4 at huawei.com (dongbo (E)) Date: Tue, 2 Feb 2021 22:05:34 +0800 Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: References: <6a306bd0-19a9-f8d2-a285-1239371f389e@redhat.com> <324d99d5-3fdb-084c-1ab8-4a2282aaa447@huawei.com> Message-ID: <7a52499a-eacc-f5ad-c573-15a1ea3101b4@huawei.com> On 2021/2/2 18:53, Ningsheng Jian wrote: > Hi, > > On 2/1/21 10:35 PM, dongbo (E) wrote: >> >> On 2021/2/1 20:32, Andrew Haley wrote: >>> On 2/1/21 8:11 AM, dongbo (E) wrote: >>>> The tests passed with the newest version and still can catch the typo. >>>> Also tested a few of injected errors, the tests failed as expected. >>> One oddity has come up. >>> >>> I'm running compiler/c2/TestShiftRightAndAccumulate >>> and the generated code I see for >>> compiler.c2.TestShiftRightAndAccumulate::test_shorts >>> >>> No vector instructions here. As far as I can see vectors are never used >>> for jshort, just for jchar. All very strange, and probably not your >>> fault, >>> but since I'm looking I had to mention it. >>> >>> The other weird thing is that {u,s}sra is never generated with the >>> .8B form. >> >> I guess the `usra` is not generated for `byte` and `short` because: >> ``` >> src/hotspot/share/opto/vectornode.cpp, line 182: >> ?? case Op_URShiftI: >> ???? switch (bt) { >> ???? case T_BOOLEAN:return Op_URShiftVB; >> ???? case T_CHAR:?? return Op_URShiftVS; >> ???? case T_BYTE: >> ???? case T_SHORT:? return 0; // Vector logical right shift for >> signed short >> ????????????????????????????? // values produces incorrect Java >> result for >> ????????????????????????????? // negative data because java code >> should convert >> ????????????????????????????? // a short value into int value with sign >> ????????????????????????????? // extension before a shift. >> ???? case T_INT:??? return Op_URShiftVI; >> ???? default:?????? ShouldNotReachHere(); return 0; >> ???? } >> ``` >> > > They can be used in vector api. E.g. > > var vba = ByteVector.fromArray(ByteVector.SPECIES_PREFERRED, ba, i); > vba.add(vba.lanewise(VectorOperators.LSHR, 3)).intoArray(ba, i); > > Thanks, > Ningsheng > I ran local tests with the code you mentioned, it does use the unsigned shift for byte and short. Thank you. From roland at openjdk.java.net Tue Feb 2 14:58:03 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 2 Feb 2021 14:58:03 GMT Subject: [jdk16] RFR: 8260709: C2: assert(false) failed: unscheduable graph Message-ID: The v = field load in the test case is found anti-dependent with the memory phi that merges the exception state of the 2 array allocation. Since JDK-8258393, anti-dependence computation only considers the Phi inputs that are reachable from the memory input of a load. As a consequence, the late control for the load is the control projection of the array allocation in the loop. When loop opts run, PhaseIdealLoop::split_if_with_blocks_post() finds that the load's late control is different from its current control (which is inside the outer loop). It tries to sink the load out of loop but ends up pinning it at its late control, the projection of the second AllocateNode. The logic that expands the AllocateNode doesn't expect a pinned node on the control projection and the result is a broken graph. I think the fix for this would be to clone the load along both the exception and the fallthrough paths. But as noted in JDK-8252372, the whole process of sinking loads out of loops doesn't seem to work as expected (for instance in this case it sinks the load from the outer loop into the inner loop). So instead of going with a complicated fix, I propose simply to detect this corner and that no attempt be made to sink the load. Note that the current logic computes the late control for the load (which should be in the loop), will create a clone for each use and assign the dom_lca of the use's control and the load late control to that use, that is the load late control. So all uses end up at the same location, the load late control. So to detect that case, it's sufficient to test the load late control. ------------- Commit messages: - test & fix Changes: https://git.openjdk.java.net/jdk16/pull/144/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk16&pr=144&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260709 Stats: 70 lines in 3 files changed: 69 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk16/pull/144.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/144/head:pull/144 PR: https://git.openjdk.java.net/jdk16/pull/144 From neliasso at openjdk.java.net Tue Feb 2 22:06:53 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 2 Feb 2021 22:06:53 GMT Subject: RFR: 8258799 : [Testbug] RandomCommandsTest must check if tested directive is added via jcmd Message-ID: RandomCommandsTest.java is checking a mix of valid and invalid compilecommands and compiler directives. When they fail they have different behaviour: A compilecommand that is malformed will result in a printed error, and then the VM continues. A compiler directive that is malformed, that is added via commandline, will abort the VM, much like any other VM-flag would. A compiler directive that is malformed, that is added via jcmd will print an error, and the VM continues - just like with any other jcmd. The RandomCommandsTest fails when generating a malformed compiler directive and adding it via jcmd - because it expects the VM to abort. This patch fixes that. ------------- Commit messages: - fix whitespace - remove tabs - Check if directive added via jcmd is valid Changes: https://git.openjdk.java.net/jdk/pull/2364/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2364&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8258799 Stats: 19 lines in 2 files changed: 16 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2364.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2364/head:pull/2364 PR: https://git.openjdk.java.net/jdk/pull/2364 From kvn at openjdk.java.net Tue Feb 2 22:31:46 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 2 Feb 2021 22:31:46 GMT Subject: RFR: 8258799 : [Testbug] RandomCommandsTest must check if tested directive is added via jcmd In-Reply-To: References: Message-ID: <1dZCUYqdkzHmU2Bsz0c8Oxq8UPw_E9GBX6JvdOVeG6k=.f32f8e34-35ae-45b6-af15-d8be4cdefe77@github.com> On Tue, 2 Feb 2021 21:55:17 GMT, Nils Eliasson wrote: > RandomCommandsTest.java is checking a mix of valid and invalid compilecommands and compiler directives. When they fail they have different behaviour: > > A compilecommand that is malformed will result in a printed error, and then the VM continues. > A compiler directive that is malformed, that is added via commandline, will abort the VM, much like any other VM-flag would. > A compiler directive that is malformed, that is added via jcmd will print an error, and the VM continues - just like with any other jcmd. > > The RandomCommandsTest fails when generating a malformed compiler directive and adding it via jcmd - because it expects the VM to abort. > > This patch fixes that. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2364 From iignatyev at openjdk.java.net Tue Feb 2 23:00:48 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 2 Feb 2021 23:00:48 GMT Subject: RFR: 8258799 : [Testbug] RandomCommandsTest must check if tested directive is added via jcmd In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 21:55:17 GMT, Nils Eliasson wrote: > RandomCommandsTest.java is checking a mix of valid and invalid compilecommands and compiler directives. When they fail they have different behaviour: > > A compilecommand that is malformed will result in a printed error, and then the VM continues. > A compiler directive that is malformed, that is added via commandline, will abort the VM, much like any other VM-flag would. > A compiler directive that is malformed, that is added via jcmd will print an error, and the VM continues - just like with any other jcmd. > > The RandomCommandsTest fails when generating a malformed compiler directive and adding it via jcmd - because it expects the VM to abort. > > This patch fixes that. Marked as reviewed by iignatyev (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2364 From dongbo at openjdk.java.net Wed Feb 3 02:04:03 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 3 Feb 2021 02:04:03 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v7] In-Reply-To: References: Message-ID: > This is a typo introduced by JDK-8255949. > Compiler will generate `ushr` for shifting right and accumulating four short integers. > It produces wrong results for specific case. The instruction should be `usra`. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: match ssra with 8B ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/136/files - new: https://git.openjdk.java.net/jdk16/pull/136/files/693f8cbd..9e71e0f5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=05-06 Stats: 32 lines in 1 file changed: 15 ins; 9 del; 8 mod Patch: https://git.openjdk.java.net/jdk16/pull/136.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/136/head:pull/136 PR: https://git.openjdk.java.net/jdk16/pull/136 From dongbo at openjdk.java.net Wed Feb 3 02:04:03 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 3 Feb 2021 02:04:03 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: <4KBCYEYrcTMEqEo89p3m4OfqSolqi2agyexH-k1vZrU=.813f7a97-b7ef-4cc0-809b-91e4c5f3c7fd@github.com> References: <4KBCYEYrcTMEqEo89p3m4OfqSolqi2agyexH-k1vZrU=.813f7a97-b7ef-4cc0-809b-91e4c5f3c7fd@github.com> Message-ID: On Tue, 2 Feb 2021 11:04:24 GMT, Ningsheng Jian wrote: > > Because we only have predicate(n->as_Vector()->length() == 8) in vsraa8B_imm, so they are not matched. > > We should fix this with the following code: > > I think this is an enhancement, and should be done in a separate patch in jdk mainline. OK, I update a test with loop size 80 for bytes so that `ssra` for 8B can be matched now. ------------- PR: https://git.openjdk.java.net/jdk16/pull/136 From dongbo at openjdk.java.net Wed Feb 3 02:20:51 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 3 Feb 2021 02:20:51 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: References: <4KBCYEYrcTMEqEo89p3m4OfqSolqi2agyexH-k1vZrU=.813f7a97-b7ef-4cc0-809b-91e4c5f3c7fd@github.com> Message-ID: On Wed, 3 Feb 2021 01:59:38 GMT, Dong Bo wrote: >>> Because we only have predicate(n->as_Vector()->length() == 8) in vsraa8B_imm, so they are not matched. >>> We should fix this with the following code: >> >> I think this is an enhancement, and should be done in a separate patch in jdk mainline. > >> > Because we only have predicate(n->as_Vector()->length() == 8) in vsraa8B_imm, so they are not matched. >> > We should fix this with the following code: >> >> I think this is an enhancement, and should be done in a separate patch in jdk mainline. > > OK, I update a test with loop size 80 for bytes so that `ssra` for 8B can be matched now. Ping... Can I get a review for the newest changes? Please let me know if we are ready to go. ------------- PR: https://git.openjdk.java.net/jdk16/pull/136 From dlong at openjdk.java.net Wed Feb 3 03:17:41 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 3 Feb 2021 03:17:41 GMT Subject: RFR: 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 15:44:16 GMT, Vladimir Kozlov wrote: > On return WB wait to acquire Compile_lock before checking compilation status > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L988 > > This lock is used by ciEnv for compiled code publishing: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciEnv.cpp#L981 > > So while WB waits the lock compiler thread can finish compilation, register nmethod and clear method's queued_for_compilation bit. > > The problem is that WB check `nm` value (compiled code) which it got before the lock and when method compilation is not finished. > > The fix is to check compiled code again similar to check in CompileBroker: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compileBroker.cpp#L1501 > > Passed hs-tier1-4 testing and 100 x vmTestbase/gc/g1/unloading/tests/unloading_compilation_*. Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2356 From serb at openjdk.java.net Wed Feb 3 04:13:52 2021 From: serb at openjdk.java.net (Sergey Bylokhov) Date: Wed, 3 Feb 2021 04:13:52 GMT Subject: RFR: 8261010: Delete the Netbeans "default" license header Message-ID: Trivial cleanup, the "default" license header is removed in a few components. ------------- Commit messages: - Initial fix Changes: https://git.openjdk.java.net/jdk/pull/2368/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2368&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261010 Stats: 14 lines in 3 files changed: 0 ins; 14 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2368.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2368/head:pull/2368 PR: https://git.openjdk.java.net/jdk/pull/2368 From iris at openjdk.java.net Wed Feb 3 04:22:41 2021 From: iris at openjdk.java.net (Iris Clark) Date: Wed, 3 Feb 2021 04:22:41 GMT Subject: RFR: 8261010: Delete the Netbeans "default" license header In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 04:01:51 GMT, Sergey Bylokhov wrote: > Trivial cleanup, the "default" license header is removed in a few components. Trivial removal of template instructions. ------------- Marked as reviewed by iris (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2368 From psadhukhan at openjdk.java.net Wed Feb 3 04:41:40 2021 From: psadhukhan at openjdk.java.net (Prasanta Sadhukhan) Date: Wed, 3 Feb 2021 04:41:40 GMT Subject: RFR: 8261010: Delete the Netbeans "default" license header In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 04:01:51 GMT, Sergey Bylokhov wrote: > Trivial cleanup, the "default" license header is removed in a few components. Marked as reviewed by psadhukhan (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2368 From iignatyev at openjdk.java.net Wed Feb 3 05:42:52 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 3 Feb 2021 05:42:52 GMT Subject: RFR: 8260296: SA's dumpreplaydata fails [v3] In-Reply-To: References: Message-ID: <0M7K991qRJQOhQfLlm--GlAwOzO8I77h51gWG4Frk28=.73b8361c-1673-4a09-8e08-7752a3ba5363@github.com> On Wed, 27 Jan 2021 08:07:04 GMT, Roland Westrelin wrote: >> I noticed that the SA's dumpreplaydata command fails with: >> >> java.lang.AssertionError: CLHSDB wasn't run successfully: Opening core file, please wait... >> hsdb> Exception in thread "main" java.lang.InternalError: ciMetadata does not appear to be polymorphic >> >> with a simple test program. This happens because the SA can't find the >> vtable symbol for ciMetadata (build produced by gcc 9.2.1). AFAIU, >> there's nothing in our build system that hides that symbol. I had to >> move one method's definition from the header file to the cpp file for >> the symbol to be visible again. >> >> We have a test that checks dumpreplaydata but it doesn't catch that >> problem. The test produces a replay file from a core file with the SA >> by running a simple test with -Xcomp and CICrash=1. So the replay data >> has very little or no profile data (which is what causes the problem >> above). I propose running a slightly more complicated test method and >> crashing after the method has had time to run for long enough to >> collect profile data. >> >> The other shortcoming of the test is that it doesn't look at the >> content of the replay file. It only warns if they differ. The replay >> file produced by the VM and the one produced by the SA should be >> identical (except for comment lines). So I propose we check that. >> >> Finally, I can't run that test on my system because core files are >> handled by systemd (I'm running some recent version of fedora). I >> suppose, the system can be configured differently but having the test >> work out the box is nice. I extended the test case to handle that. >> >> With the improved test, there are a few differences between the VM and >> SA replay files caused by VM changes that were not mirrored in the >> SA. I fixed those. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - convert all tests > - Merge branch 'master' into JDK-8260296 > - use CoreUtils > - whitespaces > - SA fixes > - VM fix > - test test/lib/jdk/test/lib/util/CoreUtils.java line 166: > 164: for (int i = 0; i < 10; i++) { > 165: Thread.sleep(5000); > 166: OutputAnalyzer out = ProcessTools.executeProcess("coredumpctl", "dump", "-1", "-o", core, Long.valueOf(pid).toString()); you can use `String::valueOf` to a string represenatation of `long`: Suggestion: OutputAnalyzer out = ProcessTools.executeProcess("coredumpctl", "dump", "-1", "-o", core, String.valueOf(pid)); ------------- PR: https://git.openjdk.java.net/jdk/pull/2195 From iignatyev at openjdk.java.net Wed Feb 3 05:57:48 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 3 Feb 2021 05:57:48 GMT Subject: RFR: 8260296: SA's dumpreplaydata fails [v3] In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 08:07:04 GMT, Roland Westrelin wrote: >> I noticed that the SA's dumpreplaydata command fails with: >> >> java.lang.AssertionError: CLHSDB wasn't run successfully: Opening core file, please wait... >> hsdb> Exception in thread "main" java.lang.InternalError: ciMetadata does not appear to be polymorphic >> >> with a simple test program. This happens because the SA can't find the >> vtable symbol for ciMetadata (build produced by gcc 9.2.1). AFAIU, >> there's nothing in our build system that hides that symbol. I had to >> move one method's definition from the header file to the cpp file for >> the symbol to be visible again. >> >> We have a test that checks dumpreplaydata but it doesn't catch that >> problem. The test produces a replay file from a core file with the SA >> by running a simple test with -Xcomp and CICrash=1. So the replay data >> has very little or no profile data (which is what causes the problem >> above). I propose running a slightly more complicated test method and >> crashing after the method has had time to run for long enough to >> collect profile data. >> >> The other shortcoming of the test is that it doesn't look at the >> content of the replay file. It only warns if they differ. The replay >> file produced by the VM and the one produced by the SA should be >> identical (except for comment lines). So I propose we check that. >> >> Finally, I can't run that test on my system because core files are >> handled by systemd (I'm running some recent version of fedora). I >> suppose, the system can be configured differently but having the test >> work out the box is nice. I extended the test case to handle that. >> >> With the improved test, there are a few differences between the VM and >> SA replay files caused by VM changes that were not mirrored in the >> SA. I fixed those. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - convert all tests > - Merge branch 'master' into JDK-8260296 > - use CoreUtils > - whitespaces > - SA fixes > - VM fix > - test a few more nits test/lib/jdk/test/lib/process/ProcessTools.java line 461: > 459: } > 460: > 461: static public class OutputAnalyzerAndPID { Suggestion: public static class OutputAnalyzerAndPID { test/lib/jdk/test/lib/process/ProcessTools.java line 461: > 459: } > 460: > 461: static public class OutputAnalyzerAndPID { can we either change `OutputAnalyzer` to store pid (and use -1 for cases when there is no one) or make `OutputAnalyzer` non-final and have `OutputAnalyzerAndPID` extending `OutputAnalyzer`? ------------- Changes requested by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2195 From kvn at openjdk.java.net Wed Feb 3 07:19:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Feb 2021 07:19:39 GMT Subject: RFR: 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 03:15:16 GMT, Dean Long wrote: >> On return WB wait to acquire Compile_lock before checking compilation status >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L988 >> >> This lock is used by ciEnv for compiled code publishing: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciEnv.cpp#L981 >> >> So while WB waits the lock compiler thread can finish compilation, register nmethod and clear method's queued_for_compilation bit. >> >> The problem is that WB check `nm` value (compiled code) which it got before the lock and when method compilation is not finished. >> >> The fix is to check compiled code again similar to check in CompileBroker: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compileBroker.cpp#L1501 >> >> Passed hs-tier1-4 testing and 100 x vmTestbase/gc/g1/unloading/tests/unloading_compilation_*. > > Looks good. Thank you, Dean ------------- PR: https://git.openjdk.java.net/jdk/pull/2356 From chagedorn at openjdk.java.net Wed Feb 3 07:37:57 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 3 Feb 2021 07:37:57 GMT Subject: RFR: 8260581: IGV: enhance node search [v4] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 11:34:59 GMT, Roberto Casta?eda Lozano wrote: >> Apply several enhancements to the quick node search functionality: >> >> - Allow users to search by node id or name by default (i.e. when no property is specified) instead of name only. >> - Show partial matches when searching for a specific property (e.g. so that searching "type=con" lists all "control"-type nodes). >> - Avoid showing the "All _N_ matching nodes" entry if there is a single match, or the user is searching a numeric value. >> - Rank matches so that full matches are listed first, followed by matches at the beginning of the partially matched value, followed by the rest of matches in increasing size of the partially matched value. Numeric matches with the same rank are sorted increasingly. For example, searching "5" on a set of nodes with labels {"5 AddI", "25 AddL", "253 AddL", "554 MulI"} should list the matches as follows: >> 1. **5** AddI >> 2. **5**54 MulI >> 3. 2**5** AddL >> 4. 2**5**3 AddL >> >> As an illustration of some of these enhancements, this screenshot shows the behavior of the quick node search functionality before the changes: >> >> ![search-before](https://user-images.githubusercontent.com/8792647/106283438-374ba500-6242-11eb-8ef4-d18117eabcbb.png) >> >> and after: >> >> ![search-after](https://user-images.githubusercontent.com/8792647/106282880-7e856600-6241-11eb-8cb5-48fae5582cc2.png) >> >> >> Tested manually on small and large (~10000 nodes) graphs. Thanks to Christian Hagedorn for feedback on several iterations of the enhancements. >> >> As part of the review, please evaluate not just the code changes but also the usability. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Sort same-rank matches by first-word numeric value > > Sort otherwise equally relevant matches by node id, which is by default the > first word in node labels. Thanks to Christian Hagedorn for the suggestion and > (slightly adapted) patch. Marked as reviewed by chagedorn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2285 From neliasso at openjdk.java.net Wed Feb 3 08:06:54 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 3 Feb 2021 08:06:54 GMT Subject: RFR: 8258799 : [Testbug] RandomCommandsTest must check if tested directive is added via jcmd In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 22:58:19 GMT, Igor Ignatyev wrote: >> RandomCommandsTest.java is checking a mix of valid and invalid compilecommands and compiler directives. When they fail they have different behaviour: >> >> A compilecommand that is malformed will result in a printed error, and then the VM continues. >> A compiler directive that is malformed, that is added via commandline, will abort the VM, much like any other VM-flag would. >> A compiler directive that is malformed, that is added via jcmd will print an error, and the VM continues - just like with any other jcmd. >> >> The RandomCommandsTest fails when generating a malformed compiler directive and adding it via jcmd - because it expects the VM to abort. >> >> This patch fixes that. > > Marked as reviewed by iignatyev (Reviewer). Thank you Igor and Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/2364 From thartmann at openjdk.java.net Wed Feb 3 08:13:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 3 Feb 2021 08:13:42 GMT Subject: Integrated: 8260928: InitArrayShortSize constraint func should print a helpful error message In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 11:31:34 GMT, Tobias Hartmann wrote: > The `InitArrayShortSize` flag requires a value that is a multiple of `BytesPerLong` but no corresponding error message is printed: > java -XX:InitArrayShortSize=7 > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 91e6c755 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/91e6c755 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8260928: InitArrayShortSize constraint func should print a helpful error message Reviewed-by: shade, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/2351 From thartmann at openjdk.java.net Wed Feb 3 08:28:53 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 3 Feb 2021 08:28:53 GMT Subject: [jdk16] RFR: 8260709: C2: assert(false) failed: unscheduable graph In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 14:53:15 GMT, Roland Westrelin wrote: > The v = field load in the test case is found anti-dependent with the > memory phi that merges the exception state of the 2 array > allocation. Since JDK-8258393, anti-dependence computation only > considers the Phi inputs that are reachable from the memory input of a > load. As a consequence, the late control for the load is the control > projection of the array allocation in the loop. When loop opts run, > PhaseIdealLoop::split_if_with_blocks_post() finds that the load's late > control is different from its current control (which is inside the > outer loop). It tries to sink the load out of loop but ends up pinning > it at its late control, the projection of the second AllocateNode. > > The logic that expands the AllocateNode doesn't expect a pinned node > on the control projection and the result is a broken graph. I think > the fix for this would be to clone the load along both the exception > and the fallthrough paths. But as noted in JDK-8252372, the whole > process of sinking loads out of loops doesn't seem to work as expected > (for instance in this case it sinks the load from the outer loop into > the inner loop). So instead of going with a complicated fix, I propose > simply to detect this corner and that no attempt be made to sink the > load. Note that the current logic computes the late control for the > load (which should be in the loop), will create a clone for each use > and assign the dom_lca of the use's control and the load late control > to that use, that is the load late control. So all uses end up at the > same location, the load late control. So to detect that case, it's > sufficient to test the load late control. This looks reasonable to me as a point fix for JDK 16, given the plan is to rework that code for JDK 17 with [JDK-8252372](https://bugs.openjdk.java.net/browse/JDK-8252372). ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/144 From aph at openjdk.java.net Wed Feb 3 09:10:47 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 3 Feb 2021 09:10:47 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v7] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 02:04:03 GMT, Dong Bo wrote: >> This is a typo introduced by JDK-8255949. >> Compiler will generate `ushr` for shifting right and accumulating four short integers. >> It produces wrong results for specific case. The instruction should be `usra`. > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > match ssra with 8B OK, thanks. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/136 From dongbo at openjdk.java.net Wed Feb 3 09:31:45 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 3 Feb 2021 09:31:45 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v7] In-Reply-To: References: Message-ID: <9WisRAG9qBk4FL87nQy3kNCLUyhKanVfnc_ZY2ZxkB8=.abf07896-4b00-4769-85b4-670d88f25aa3@github.com> On Wed, 3 Feb 2021 09:07:36 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request incrementally with one additional commit since the last revision: >> >> match ssra with 8B > > OK, thanks. Thank you all for the review. ------------- PR: https://git.openjdk.java.net/jdk16/pull/136 From rcastanedalo at openjdk.java.net Wed Feb 3 09:59:43 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 3 Feb 2021 09:59:43 GMT Subject: RFR: 8260581: IGV: enhance node search [v4] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 07:34:55 GMT, Christian Hagedorn wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Sort same-rank matches by first-word numeric value >> >> Sort otherwise equally relevant matches by node id, which is by default the >> first word in node labels. Thanks to Christian Hagedorn for the suggestion and >> (slightly adapted) patch. > > Marked as reviewed by chagedorn (Reviewer). Please sponsor. ------------- PR: https://git.openjdk.java.net/jdk/pull/2285 From rcastanedalo at openjdk.java.net Wed Feb 3 11:14:50 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 3 Feb 2021 11:14:50 GMT Subject: Integrated: 8260581: IGV: enhance node search In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 11:07:38 GMT, Roberto Casta?eda Lozano wrote: > Apply several enhancements to the quick node search functionality: > > - Allow users to search by node id or name by default (i.e. when no property is specified) instead of name only. > - Show partial matches when searching for a specific property (e.g. so that searching "type=con" lists all "control"-type nodes). > - Avoid showing the "All _N_ matching nodes" entry if there is a single match, or the user is searching a numeric value. > - Rank matches so that full matches are listed first, followed by matches at the beginning of the partially matched value, followed by the rest of matches in increasing size of the partially matched value. Numeric matches with the same rank are sorted increasingly. For example, searching "5" on a set of nodes with labels {"5 AddI", "25 AddL", "253 AddL", "554 MulI"} should list the matches as follows: > 1. **5** AddI > 2. **5**54 MulI > 3. 2**5** AddL > 4. 2**5**3 AddL > > As an illustration of some of these enhancements, this screenshot shows the behavior of the quick node search functionality before the changes: > > ![search-before](https://user-images.githubusercontent.com/8792647/106283438-374ba500-6242-11eb-8ef4-d18117eabcbb.png) > > and after: > > ![search-after](https://user-images.githubusercontent.com/8792647/106282880-7e856600-6241-11eb-8cb5-48fae5582cc2.png) > > > Tested manually on small and large (~10000 nodes) graphs. Thanks to Christian Hagedorn for feedback on several iterations of the enhancements. > > As part of the review, please evaluate not just the code changes but also the usability. This pull request has now been integrated. Changeset: ae2c5f07 Author: Roberto Casta?eda Lozano Committer: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/ae2c5f07 Stats: 100 lines in 3 files changed: 76 ins; 5 del; 19 mod 8260581: IGV: enhance node search Allow users to search by node id or name by default, show partial matches when searching for a specific property, show 'All N matching nodes' entry only if relevant, and rank results by level of matching. Co-authored-by: Christian Hagedorn Reviewed-by: chagedorn, vlivanov, xliu ------------- PR: https://git.openjdk.java.net/jdk/pull/2285 From chagedorn at openjdk.java.net Wed Feb 3 11:20:47 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 3 Feb 2021 11:20:47 GMT Subject: [jdk16] RFR: 8260709: C2: assert(false) failed: unscheduable graph In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 14:53:15 GMT, Roland Westrelin wrote: > The v = field load in the test case is found anti-dependent with the > memory phi that merges the exception state of the 2 array > allocation. Since JDK-8258393, anti-dependence computation only > considers the Phi inputs that are reachable from the memory input of a > load. As a consequence, the late control for the load is the control > projection of the array allocation in the loop. When loop opts run, > PhaseIdealLoop::split_if_with_blocks_post() finds that the load's late > control is different from its current control (which is inside the > outer loop). It tries to sink the load out of loop but ends up pinning > it at its late control, the projection of the second AllocateNode. > > The logic that expands the AllocateNode doesn't expect a pinned node > on the control projection and the result is a broken graph. I think > the fix for this would be to clone the load along both the exception > and the fallthrough paths. But as noted in JDK-8252372, the whole > process of sinking loads out of loops doesn't seem to work as expected > (for instance in this case it sinks the load from the outer loop into > the inner loop). So instead of going with a complicated fix, I propose > simply to detect this corner and that no attempt be made to sink the > load. Note that the current logic computes the late control for the > load (which should be in the loop), will create a clone for each use > and assign the dom_lca of the use's control and the load late control > to that use, that is the load late control. So all uses end up at the > same location, the load late control. So to detect that case, it's > sufficient to test the load late control. That sounds reasonable and looks good to me. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/144 From roland at openjdk.java.net Wed Feb 3 11:55:51 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 3 Feb 2021 11:55:51 GMT Subject: [jdk16] RFR: 8260709: C2: assert(false) failed: unscheduable graph In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 08:25:27 GMT, Tobias Hartmann wrote: >> The v = field load in the test case is found anti-dependent with the >> memory phi that merges the exception state of the 2 array >> allocation. Since JDK-8258393, anti-dependence computation only >> considers the Phi inputs that are reachable from the memory input of a >> load. As a consequence, the late control for the load is the control >> projection of the array allocation in the loop. When loop opts run, >> PhaseIdealLoop::split_if_with_blocks_post() finds that the load's late >> control is different from its current control (which is inside the >> outer loop). It tries to sink the load out of loop but ends up pinning >> it at its late control, the projection of the second AllocateNode. >> >> The logic that expands the AllocateNode doesn't expect a pinned node >> on the control projection and the result is a broken graph. I think >> the fix for this would be to clone the load along both the exception >> and the fallthrough paths. But as noted in JDK-8252372, the whole >> process of sinking loads out of loops doesn't seem to work as expected >> (for instance in this case it sinks the load from the outer loop into >> the inner loop). So instead of going with a complicated fix, I propose >> simply to detect this corner and that no attempt be made to sink the >> load. Note that the current logic computes the late control for the >> load (which should be in the loop), will create a clone for each use >> and assign the dom_lca of the use's control and the load late control >> to that use, that is the load late control. So all uses end up at the >> same location, the load late control. So to detect that case, it's >> sufficient to test the load late control. > > This looks reasonable to me as a point fix for JDK 16, given the plan is to rework that code for JDK 17 with [JDK-8252372](https://bugs.openjdk.java.net/browse/JDK-8252372). thanks for the reviews @TobiHartmann @chhagedorn ------------- PR: https://git.openjdk.java.net/jdk16/pull/144 From roland at openjdk.java.net Wed Feb 3 12:06:45 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 3 Feb 2021 12:06:45 GMT Subject: RFR: 8260296: SA's dumpreplaydata fails [v3] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 05:45:02 GMT, Igor Ignatyev wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - convert all tests >> - Merge branch 'master' into JDK-8260296 >> - use CoreUtils >> - whitespaces >> - SA fixes >> - VM fix >> - test > > test/lib/jdk/test/lib/process/ProcessTools.java line 461: > >> 459: } >> 460: >> 461: static public class OutputAnalyzerAndPID { > > can we either change `OutputAnalyzer` to store pid (and use -1 for cases when there is no one) or make `OutputAnalyzer` non-final and have `OutputAnalyzerAndPID` extending `OutputAnalyzer`? Thanks for reviewing this. I did not store the pid in the OutputAnalyzer because it doesn't seem to belong there as it has nothing to do with the text output of a test. But if you think that's ok. that's fine with me too. Do you prefer an extra field in OutputAnalyzer or a new subclass? ------------- PR: https://git.openjdk.java.net/jdk/pull/2195 From redestad at openjdk.java.net Wed Feb 3 12:14:05 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 3 Feb 2021 12:14:05 GMT Subject: RFR: 8261030: Avoid loading GenerateJLIClassesHelper at runtime Message-ID: This moves the tracing methods added to GenerateJLIClassesHelper in JDK-8252725 to MethodHandleStatics, which avoids loading at runtime some code meant for jlink. ------------- Commit messages: - Remove unused import, copyrights - Move tracing methods and constants from GenerateJLIClassesHelper to MethodHandleStatics Changes: https://git.openjdk.java.net/jdk/pull/2376/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2376&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261030 Stats: 69 lines in 4 files changed: 34 ins; 30 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2376.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2376/head:pull/2376 PR: https://git.openjdk.java.net/jdk/pull/2376 From neliasso at openjdk.java.net Wed Feb 3 16:04:55 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 3 Feb 2021 16:04:55 GMT Subject: Integrated: 8258799 : [Testbug] RandomCommandsTest must check if tested directive is added via jcmd In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 21:55:17 GMT, Nils Eliasson wrote: > RandomCommandsTest.java is checking a mix of valid and invalid compilecommands and compiler directives. When they fail they have different behaviour: > > A compilecommand that is malformed will result in a printed error, and then the VM continues. > A compiler directive that is malformed, that is added via commandline, will abort the VM, much like any other VM-flag would. > A compiler directive that is malformed, that is added via jcmd will print an error, and the VM continues - just like with any other jcmd. > > The RandomCommandsTest fails when generating a malformed compiler directive and adding it via jcmd - because it expects the VM to abort. > > This patch fixes that. This pull request has now been integrated. Changeset: 472bf629 Author: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/472bf629 Stats: 19 lines in 2 files changed: 16 ins; 0 del; 3 mod 8258799: [Testbug] RandomCommandsTest must check if tested directive is added via jcmd Reviewed-by: kvn, iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/2364 From iignatyev at openjdk.java.net Wed Feb 3 16:30:41 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 3 Feb 2021 16:30:41 GMT Subject: RFR: 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 15:44:16 GMT, Vladimir Kozlov wrote: > On return WB wait to acquire Compile_lock before checking compilation status > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L988 > > This lock is used by ciEnv for compiled code publishing: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciEnv.cpp#L981 > > So while WB waits the lock compiler thread can finish compilation, register nmethod and clear method's queued_for_compilation bit. > > The problem is that WB check `nm` value (compiled code) which it got before the lock and when method compilation is not finished. > > The fix is to check compiled code again similar to check in CompileBroker: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compileBroker.cpp#L1501 > > Passed hs-tier1-4 testing and 100 x vmTestbase/gc/g1/unloading/tests/unloading_compilation_*. LGTM ------------- Marked as reviewed by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2356 From github.com+51754783+coreyashford at openjdk.java.net Wed Feb 3 17:38:45 2021 From: github.com+51754783+coreyashford at openjdk.java.net (Corey Ashford) Date: Wed, 3 Feb 2021 17:38:45 GMT Subject: RFR: 8259822: [PPC64] Support the prefixed instruction format added in POWER10 [v4] In-Reply-To: <2CXH3zvUkBfqLrr8SPpD-x3RcSbFzw0S6NGLWgtirQ8=.0df1a0c8-3ac2-4e21-9e6a-f0b43a3ae20c@github.com> References: <2CXH3zvUkBfqLrr8SPpD-x3RcSbFzw0S6NGLWgtirQ8=.0df1a0c8-3ac2-4e21-9e6a-f0b43a3ae20c@github.com> Message-ID: On Tue, 2 Feb 2021 12:45:03 GMT, Kazunori Ogata wrote: >> The POWER10 processor, which implements Power ISA 3.1 [1], supports new instruction formats where an instruction takes two 32bit words. The first word is called prefix, and the instructions with prefix are called prefixed instructions. With more bits in opcode and operand fields, POWER10 supports larger immediate value in an operand, as well as many new instructions. >> >> This is the first changes to handle prefixed instructions, and this adds support of prefixed addi (= paddi) instruction as an example of prefix usage. paddi accepts 34bit immediate value, while original addi accepts 16bit value. >> >> [1] https://ibm.ent.box.com/s/hhjfw0x0lrbtyzmiaffnbxh2fuo0fog0 > > Kazunori Ogata has updated the pull request incrementally with one additional commit since the last revision: > > Removed pla and psubi and adjusted spacing based on review comments Looks good! Thanks for the additional formatting cleanup ------------- Marked as reviewed by CoreyAshford at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/2095 From kvn at openjdk.java.net Wed Feb 3 18:08:44 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Feb 2021 18:08:44 GMT Subject: RFR: 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 16:27:54 GMT, Igor Ignatyev wrote: >> On return WB wait to acquire Compile_lock before checking compilation status >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L988 >> >> This lock is used by ciEnv for compiled code publishing: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciEnv.cpp#L981 >> >> So while WB waits the lock compiler thread can finish compilation, register nmethod and clear method's queued_for_compilation bit. >> >> The problem is that WB check `nm` value (compiled code) which it got before the lock and when method compilation is not finished. >> >> The fix is to check compiled code again similar to check in CompileBroker: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compileBroker.cpp#L1501 >> >> Passed hs-tier1-4 testing and 100 x vmTestbase/gc/g1/unloading/tests/unloading_compilation_*. > > LGTM Thank you, Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/2356 From kvn at openjdk.java.net Wed Feb 3 18:08:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Feb 2021 18:08:45 GMT Subject: Integrated: 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 15:44:16 GMT, Vladimir Kozlov wrote: > On return WB wait to acquire Compile_lock before checking compilation status > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L988 > > This lock is used by ciEnv for compiled code publishing: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciEnv.cpp#L981 > > So while WB waits the lock compiler thread can finish compilation, register nmethod and clear method's queued_for_compilation bit. > > The problem is that WB check `nm` value (compiled code) which it got before the lock and when method compilation is not finished. > > The fix is to check compiled code again similar to check in CompileBroker: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compileBroker.cpp#L1501 > > Passed hs-tier1-4 testing and 100 x vmTestbase/gc/g1/unloading/tests/unloading_compilation_*. This pull request has now been integrated. Changeset: f025bc1d Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/f025bc1d Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" Reviewed-by: dlong, iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/2356 From iignatyev at openjdk.java.net Wed Feb 3 18:54:43 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 3 Feb 2021 18:54:43 GMT Subject: RFR: 8260296: SA's dumpreplaydata fails [v3] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 12:04:00 GMT, Roland Westrelin wrote: >> test/lib/jdk/test/lib/process/ProcessTools.java line 461: >> >>> 459: } >>> 460: >>> 461: static public class OutputAnalyzerAndPID { >> >> can we either change `OutputAnalyzer` to store pid (and use -1 for cases when there is no one) or make `OutputAnalyzer` non-final and have `OutputAnalyzerAndPID` extending `OutputAnalyzer`? > > Thanks for reviewing this. I did not store the pid in the OutputAnalyzer because it doesn't seem to belong there as it has nothing to do with the text output of a test. But if you think that's ok. that's fine with me too. Do you prefer an extra field in OutputAnalyzer or a new subclass? `OutputAnalyzer`/ `OutputBuffer` already have an exit value, which arguable has nothing to do w/ the text output either, so I think it's ok to add a pid into `OutputBuffer` i-face and both of its implementations. ------------- PR: https://git.openjdk.java.net/jdk/pull/2195 From mdoerr at openjdk.java.net Wed Feb 3 21:03:42 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 3 Feb 2021 21:03:42 GMT Subject: RFR: 8259822: [PPC64] Support the prefixed instruction format added in POWER10 [v4] In-Reply-To: References: <2CXH3zvUkBfqLrr8SPpD-x3RcSbFzw0S6NGLWgtirQ8=.0df1a0c8-3ac2-4e21-9e6a-f0b43a3ae20c@github.com> Message-ID: On Wed, 3 Feb 2021 17:35:33 GMT, Corey Ashford wrote: >> Kazunori Ogata has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed pla and psubi and adjusted spacing based on review comments > > Looks good! Thanks for the additional formatting cleanup Sorry that I didn't review it, yet. I didn't have enough time. Please note that C2 already has a mechanism to handle alignment: Insert "ins_alignment(n);" into the instruct. This allows C2 to insert up to n-1 nop instructions. Add a "compute_padding" function to determine the actual number of nops to insert given the current offset. They are used in s390.ad. Please take a look. ------------- PR: https://git.openjdk.java.net/jdk/pull/2095 From dongbo at openjdk.java.net Wed Feb 3 21:43:50 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 3 Feb 2021 21:43:50 GMT Subject: [jdk16] Integrated: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 13:01:07 GMT, Dong Bo wrote: > This is a typo introduced by JDK-8255949. > Compiler will generate `ushr` for shifting right and accumulating four short integers. > It produces wrong results for specific case. The instruction should be `usra`. This pull request has now been integrated. Changeset: 5307afa9 Author: Dong Bo Committer: Dean Long URL: https://git.openjdk.java.net/jdk16/commit/5307afa9 Stats: 479 lines in 2 files changed: 458 ins; 16 del; 5 mod 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers Reviewed-by: iveresov, dlong, njian, aph ------------- PR: https://git.openjdk.java.net/jdk16/pull/136 From serb at openjdk.java.net Wed Feb 3 23:54:51 2021 From: serb at openjdk.java.net (Sergey Bylokhov) Date: Wed, 3 Feb 2021 23:54:51 GMT Subject: Integrated: 8261010: Delete the Netbeans "default" license header In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 04:01:51 GMT, Sergey Bylokhov wrote: > Trivial cleanup, the "default" license header is removed in a few components. This pull request has now been integrated. Changeset: f279ff9d Author: Sergey Bylokhov URL: https://git.openjdk.java.net/jdk/commit/f279ff9d Stats: 14 lines in 3 files changed: 0 ins; 14 del; 0 mod 8261010: Delete the Netbeans "default" license header Reviewed-by: iris, psadhukhan ------------- PR: https://git.openjdk.java.net/jdk/pull/2368 From roland at openjdk.java.net Thu Feb 4 08:43:08 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 4 Feb 2021 08:43:08 GMT Subject: RFR: 8260637: Shenandoah: assert(_base == Tuple) failure during C2 compilation Message-ID: Another shenandoah bug with a fix in shared code. LRBRightAfterMemBar.test2() has 2 allocations that are non escaping but non scalarizable. As a result, the null check for a3.f is optimized out but the CastPP is left in the graph. That CastPP becomes control dependent on the o2 == null check which is later hoisted out of the loop. The CastPP is then right after the membar of the barrier = 0x42 volatile access but with an out of loop control. Because the node is considered pinned by loopopts, it is assigned the membar as control. The input of the CastPP is a shenandoah barrier that's sandwiched between the membar and the CastPP and so expanded right after the membar (that is between the membar and its control projection). That causes the crash. I don't think cast nodes need to be pinned so I propose that as a fix. ------------- Commit messages: - test & fix Changes: https://git.openjdk.java.net/jdk/pull/2400/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2400&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260637 Stats: 33 lines in 2 files changed: 27 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2400.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2400/head:pull/2400 PR: https://git.openjdk.java.net/jdk/pull/2400 From whuang at openjdk.java.net Thu Feb 4 08:55:07 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Thu, 4 Feb 2021 08:55:07 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap Message-ID: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> JDK-8075052 have removed useless autobox. However, in some cases, the box is still saved. For instance: @Benchmark public void testMethod(Blackhole bh) { int sum = 0; for (int i = 0; i < data.length; i++) { Integer ii = Integer.valueOf(data[i]); if (i < data.length) { sum += ii.intValue(); } } bh.consume(sum); } Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. The uncommon_trap is generated by the optimized "if", because its condition is always true. We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, and deleting the use of box: There is no additional fail/error(s) of jtreg after this patch. ------------- Commit messages: - 8261137: Optimization of Box nodes in uncommon_trap Changes: https://git.openjdk.java.net/jdk/pull/2401/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2401&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261137 Stats: 46 lines in 1 file changed: 45 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2401.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2401/head:pull/2401 PR: https://git.openjdk.java.net/jdk/pull/2401 From roland at openjdk.java.net Thu Feb 4 09:44:13 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 4 Feb 2021 09:44:13 GMT Subject: RFR: 8260296: SA's dumpreplaydata fails [v4] In-Reply-To: References: Message-ID: > I noticed that the SA's dumpreplaydata command fails with: > > java.lang.AssertionError: CLHSDB wasn't run successfully: Opening core file, please wait... > hsdb> Exception in thread "main" java.lang.InternalError: ciMetadata does not appear to be polymorphic > > with a simple test program. This happens because the SA can't find the > vtable symbol for ciMetadata (build produced by gcc 9.2.1). AFAIU, > there's nothing in our build system that hides that symbol. I had to > move one method's definition from the header file to the cpp file for > the symbol to be visible again. > > We have a test that checks dumpreplaydata but it doesn't catch that > problem. The test produces a replay file from a core file with the SA > by running a simple test with -Xcomp and CICrash=1. So the replay data > has very little or no profile data (which is what causes the problem > above). I propose running a slightly more complicated test method and > crashing after the method has had time to run for long enough to > collect profile data. > > The other shortcoming of the test is that it doesn't look at the > content of the replay file. It only warns if they differ. The replay > file produced by the VM and the one produced by the SA should be > identical (except for comment lines). So I propose we check that. > > Finally, I can't run that test on my system because core files are > handled by systemd (I'm running some recent version of fedora). I > suppose, the system can be configured differently but having the test > work out the box is nice. I extended the test case to handle that. > > With the improved test, there are a few differences between the VM and > SA replay files caused by VM changes that were not mirrored in the > SA. I fixed those. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - pid in OutputAnalyzer - pid in OutputAnalyzer - Merge branch 'master' into JDK-8260296 - convert all tests - Merge branch 'master' into JDK-8260296 - use CoreUtils - whitespaces - SA fixes - VM fix - test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2195/files - new: https://git.openjdk.java.net/jdk/pull/2195/files/d1e9381c..da2fe2be Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2195&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2195&range=02-03 Stats: 30246 lines in 796 files changed: 11471 ins; 7752 del; 11023 mod Patch: https://git.openjdk.java.net/jdk/pull/2195.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2195/head:pull/2195 PR: https://git.openjdk.java.net/jdk/pull/2195 From roland at openjdk.java.net Thu Feb 4 09:44:13 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 4 Feb 2021 09:44:13 GMT Subject: RFR: 8260296: SA's dumpreplaydata fails [v3] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 18:51:57 GMT, Igor Ignatyev wrote: >> Thanks for reviewing this. I did not store the pid in the OutputAnalyzer because it doesn't seem to belong there as it has nothing to do with the text output of a test. But if you think that's ok. that's fine with me too. Do you prefer an extra field in OutputAnalyzer or a new subclass? > > `OutputAnalyzer`/ `OutputBuffer` already have an exit value, which arguable has nothing to do w/ the text output either, so I think it's ok to add a pid into `OutputBuffer` i-face and both of its implementations. I just pushed a change that does this. Does that look ok to you? ------------- PR: https://git.openjdk.java.net/jdk/pull/2195 From github.com+10482586+therealeliu at openjdk.java.net Thu Feb 4 10:04:39 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Thu, 4 Feb 2021 10:04:39 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap In-Reply-To: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Thu, 4 Feb 2021 08:43:35 GMT, Wang Huang wrote: > JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: > @Benchmark > public void testMethod(Blackhole bh) { > int sum = 0; > for (int i = 0; i < data.length; i++) { > Integer ii = Integer.valueOf(data[i]); > if (i < data.length) { > sum += ii.intValue(); > } > } > bh.consume(sum); > } > Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. > The uncommon_trap is generated by the optimized "if", because its condition is always true. > > We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, > and deleting the use of box: > > There is no additional fail/error(s) of jtreg after this patch. I was wandering if we can remove the useless `if` as it's always true in this case. Do you know why this kind of `if` haven't been eliminated by GVN phase? ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From jiefu at openjdk.java.net Thu Feb 4 11:00:49 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 4 Feb 2021 11:00:49 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> <_Wm-fi9j4TZ41F0G_92f7ioKQeDNgZiOEMmLkZ0lvvE=.0a9beba5-5089-4368-b4bc-73faf9d5e858@github.com> <226iFOsl1hXrEoSe9uzgBb1Z75wxQEv5azlJIfzCO4k=.69d5ed3a-7337-472d-b106-1ce2e5d361bf@github.com> Message-ID: On Tue, 2 Feb 2021 01:58:56 GMT, Jie Fu wrote: > Good. Please, file a follow-up RFE to improve the test. The RFE has been filed here: https://bugs.openjdk.java.net/browse/JDK-8261152 Thanks. ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From neliasso at openjdk.java.net Thu Feb 4 12:44:42 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 4 Feb 2021 12:44:42 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap In-Reply-To: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Thu, 4 Feb 2021 08:43:35 GMT, Wang Huang wrote: > JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: > @Benchmark > public void testMethod(Blackhole bh) { > int sum = 0; > for (int i = 0; i < data.length; i++) { > Integer ii = Integer.valueOf(data[i]); > if (i < data.length) { > sum += ii.intValue(); > } > } > bh.consume(sum); > } > Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. > The uncommon_trap is generated by the optimized "if", because its condition is always true. > > We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, > and deleting the use of box: > > There is no additional fail/error(s) of jtreg after this patch. src/hotspot/share/opto/callGenerator.cpp line 606: > 604: } > 605: > 606: if (callprojs.resproj != NULL && call->is_CallStaticJava() && Please extract the new code into a method with a descriptive name. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From roland at openjdk.java.net Thu Feb 4 15:14:54 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 4 Feb 2021 15:14:54 GMT Subject: [jdk16] Integrated: 8260709: C2: assert(false) failed: unscheduable graph In-Reply-To: References: Message-ID: <6YVllOtZR6uA1BONaiOKCdow7or4K4S0V52fLw6nWHs=.5dd58a60-95cd-4eee-8df0-fc4e9c35d090@github.com> On Tue, 2 Feb 2021 14:53:15 GMT, Roland Westrelin wrote: > The v = field load in the test case is found anti-dependent with the > memory phi that merges the exception state of the 2 array > allocation. Since JDK-8258393, anti-dependence computation only > considers the Phi inputs that are reachable from the memory input of a > load. As a consequence, the late control for the load is the control > projection of the array allocation in the loop. When loop opts run, > PhaseIdealLoop::split_if_with_blocks_post() finds that the load's late > control is different from its current control (which is inside the > outer loop). It tries to sink the load out of loop but ends up pinning > it at its late control, the projection of the second AllocateNode. > > The logic that expands the AllocateNode doesn't expect a pinned node > on the control projection and the result is a broken graph. I think > the fix for this would be to clone the load along both the exception > and the fallthrough paths. But as noted in JDK-8252372, the whole > process of sinking loads out of loops doesn't seem to work as expected > (for instance in this case it sinks the load from the outer loop into > the inner loop). So instead of going with a complicated fix, I propose > simply to detect this corner and that no attempt be made to sink the > load. Note that the current logic computes the late control for the > load (which should be in the loop), will create a clone for each use > and assign the dom_lca of the use's control and the load late control > to that use, that is the load late control. So all uses end up at the > same location, the load late control. So to detect that case, it's > sufficient to test the load late control. This pull request has now been integrated. Changeset: 4de3a6be Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk16/commit/4de3a6be Stats: 70 lines in 3 files changed: 69 ins; 0 del; 1 mod 8260709: C2: assert(false) failed: unscheduable graph Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.java.net/jdk16/pull/144 From iignatyev at openjdk.java.net Thu Feb 4 18:28:48 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Thu, 4 Feb 2021 18:28:48 GMT Subject: RFR: 8260296: SA's dumpreplaydata fails [v4] In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 09:44:13 GMT, Roland Westrelin wrote: >> I noticed that the SA's dumpreplaydata command fails with: >> >> java.lang.AssertionError: CLHSDB wasn't run successfully: Opening core file, please wait... >> hsdb> Exception in thread "main" java.lang.InternalError: ciMetadata does not appear to be polymorphic >> >> with a simple test program. This happens because the SA can't find the >> vtable symbol for ciMetadata (build produced by gcc 9.2.1). AFAIU, >> there's nothing in our build system that hides that symbol. I had to >> move one method's definition from the header file to the cpp file for >> the symbol to be visible again. >> >> We have a test that checks dumpreplaydata but it doesn't catch that >> problem. The test produces a replay file from a core file with the SA >> by running a simple test with -Xcomp and CICrash=1. So the replay data >> has very little or no profile data (which is what causes the problem >> above). I propose running a slightly more complicated test method and >> crashing after the method has had time to run for long enough to >> collect profile data. >> >> The other shortcoming of the test is that it doesn't look at the >> content of the replay file. It only warns if they differ. The replay >> file produced by the VM and the one produced by the SA should be >> identical (except for comment lines). So I propose we check that. >> >> Finally, I can't run that test on my system because core files are >> handled by systemd (I'm running some recent version of fedora). I >> suppose, the system can be configured differently but having the test >> work out the box is nice. I extended the test case to handle that. >> >> With the improved test, there are a few differences between the VM and >> SA replay files caused by VM changes that were not mirrored in the >> SA. I fixed those. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - pid in OutputAnalyzer > - pid in OutputAnalyzer > - Merge branch 'master' into JDK-8260296 > - convert all tests > - Merge branch 'master' into JDK-8260296 > - use CoreUtils > - whitespaces > - SA fixes > - VM fix > - test LGTM, you still need to update the copyright years as @plummercj [mentioned](https://github.com/openjdk/jdk/pull/2195#issuecomment-768115732). ------------- Marked as reviewed by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2195 From iignatyev at openjdk.java.net Thu Feb 4 18:28:48 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Thu, 4 Feb 2021 18:28:48 GMT Subject: RFR: 8260296: SA's dumpreplaydata fails [v3] In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 09:40:33 GMT, Roland Westrelin wrote: >> `OutputAnalyzer`/ `OutputBuffer` already have an exit value, which arguable has nothing to do w/ the text output either, so I think it's ok to add a pid into `OutputBuffer` i-face and both of its implementations. > > I just pushed a change that does this. Does that look ok to you? yes, thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/2195 From cjplummer at openjdk.java.net Thu Feb 4 19:18:46 2021 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Thu, 4 Feb 2021 19:18:46 GMT Subject: RFR: 8260296: SA's dumpreplaydata fails [v4] In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 18:25:45 GMT, Igor Ignatyev wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - pid in OutputAnalyzer >> - pid in OutputAnalyzer >> - Merge branch 'master' into JDK-8260296 >> - convert all tests >> - Merge branch 'master' into JDK-8260296 >> - use CoreUtils >> - whitespaces >> - SA fixes >> - VM fix >> - test > > LGTM, you still need to update the copyright years as @plummercj [mentioned](https://github.com/openjdk/jdk/pull/2195#issuecomment-768115732). Yes, this is a much better solution for getting the pid. ------------- PR: https://git.openjdk.java.net/jdk/pull/2195 From pli at openjdk.java.net Fri Feb 5 02:33:54 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Fri, 5 Feb 2021 02:33:54 GMT Subject: RFR: 8261022: Fix incorrect result of Math.abs() with char type Message-ID: Math.abs() with char type may return incorrect result after C2 superword optimization. It can be reproduced by below Java code and commands. public class Bug { private static int SIZE = 60000; private static char[] a = new char[SIZE]; private static char[] b = new char[SIZE]; public static void main(String[] args) { for (int i = 0; i < SIZE; i++) { a[i] = b[i] = (char) i; } for (int i = 0; i < SIZE; i++) { a[i] = (char) Math.abs(a[i]); } for (int i = 0; i < SIZE; i++) { if (a[i] != b[i]) { throw new RuntimeException("Broken!"); } } System.out.println("OK"); } } // $ java -Xint Bug // OK // $ java -Xcomp -XX:-TieredCompilation Bug // Exception in thread "main" java.lang.RuntimeException: Broken! // at Bug.main(Bug.java:15) In Java, 'char' is a 16-bit unsigned integer type and the abs() method should always return the value of its input. But with C2 vectorization, the sign bit of the 16-bit value is cleared because it's regarded as a signed value. Root cause is that we get an imprecise vector element type for AbsINode from SuperWord::compute_vector_element_type(). In any Java arithmetic operation, operands of small integer types (boolean, byte, char & short) should be promoted to int first. As vector elements of small types don't have upper bits of int, for RShiftI or AbsI operations, the compiler has to know the precise signedness info of the 1st operand. These operations shouldn't be vectorized if the signedness info is imprecise. In code SuperWord::compute_vector_element_type(), we have some special handling for right shift. It limited the vectorization of small integer right shift to operations only after loads. The reason is that in the C2 compiler, only LoadNode has precise signedness info of its value. When JDK-8222074 enabled abs vectorization, it didn't involve AbsI operation into the special handling and thus introduced this bug. This patch just does the fix at this point. Tested hotspot::hotspot_all_no_apps, jdk::jdk_core and langtools::tier1, no new failure is found. Also created a new jtreg with this fix. ------------- Commit messages: - 8261022: Fix incorrect result of Math.abs() with char type Changes: https://git.openjdk.java.net/jdk/pull/2419/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2419&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261022 Stats: 76 lines in 2 files changed: 66 ins; 1 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/2419.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2419/head:pull/2419 PR: https://git.openjdk.java.net/jdk/pull/2419 From github.com+2249648+johntortugo at openjdk.java.net Fri Feb 5 03:35:56 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Fri, 5 Feb 2021 03:35:56 GMT Subject: RFR: 8241502: Migrate x86_64.ad to MacroAssembler Message-ID: Relates to: https://bugs.openjdk.java.net/browse/JDK-8241502 Tested on: Linux tier1, 2 and 3 Can you please take a look whether these changes are going in the direction expected or not? If it is, I'll continue working on the `JDK-8241502` but I'd like to split it in a few PRs since it's a lot of changes. ------------- Commit messages: - Merge branch 'jdk-8241502' of https://github.com/JohnTortugo/jdk into jdk-8241502 - First part of Migrate x86_64 to MacroAssembler - First part of Migrate x86_64 to MacroAssembler - Merge pull request #1 from openjdk/master Changes: https://git.openjdk.java.net/jdk/pull/2420/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2420&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8241502 Stats: 174 lines in 1 file changed: 44 ins; 21 del; 109 mod Patch: https://git.openjdk.java.net/jdk/pull/2420.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2420/head:pull/2420 PR: https://git.openjdk.java.net/jdk/pull/2420 From github.com+25214855+casparcwang at openjdk.java.net Fri Feb 5 04:47:47 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 5 Feb 2021 04:47:47 GMT Subject: RFR: 8261152: Refine the compiler/vectorapi/VectorRebracket128Test.java test Message-ID: Refine test VectorRebracket128Test.java as discussed here https://github.com/openjdk/jdk16/pull/139#discussion_r567796847 1, Explicit trigger gc in the test 2, Remove redundant imports 3, Remove weird java options ------------- Commit messages: - 8261152: Refine the compiler/vectorapi/VectorRebracket128Test.java test Changes: https://git.openjdk.java.net/jdk/pull/2422/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2422&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261152 Stats: 26 lines in 1 file changed: 17 ins; 7 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2422.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2422/head:pull/2422 PR: https://git.openjdk.java.net/jdk/pull/2422 From whuang at openjdk.java.net Fri Feb 5 06:49:42 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 5 Feb 2021 06:49:42 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Thu, 4 Feb 2021 12:41:50 GMT, Nils Eliasson wrote: >> JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: >> @Benchmark >> public void testMethod(Blackhole bh) { >> int sum = 0; >> for (int i = 0; i < data.length; i++) { >> Integer ii = Integer.valueOf(data[i]); >> if (i < data.length) { >> sum += ii.intValue(); >> } >> } >> bh.consume(sum); >> } >> Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. >> The uncommon_trap is generated by the optimized "if", because its condition is always true. >> >> We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, >> and deleting the use of box: >> >> There is no additional fail/error(s) of jtreg after this patch. > > src/hotspot/share/opto/callGenerator.cpp line 606: > >> 604: } >> 605: >> 606: if (callprojs.resproj != NULL && call->is_CallStaticJava() && > > Please extract the new code into a method with a descriptive name. OK. I will refactor these codes ASAP. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From whuang at openjdk.java.net Fri Feb 5 07:18:43 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 5 Feb 2021 07:18:43 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Thu, 4 Feb 2021 10:01:50 GMT, Eric Liu wrote: > I was wandering if we can remove the useless `if` as it's always true in this case. Do you know why this kind of `if` haven't been eliminated by GVN phase? It is a good idea . However, C2 might not optimize some `if` for many reasons: * This situation can also be triggered in calling other method. For example : ``` public class MyBenchmark { static int[] data = new int[10000]; static { for(int i = 0; i < data.length; ++i) { data[i] = 299; } } @Benchmark public void testMethod(Blackhole bh) { int sum = 0; for (int i = 0; i < data.length; i++) { Integer ii = Integer.valueOf(data[i]); sum += abs(ii); } bh.consume(sum); } public int abs(Integer ii) { if (ii.intValue() > 0) { return ii.intValue(); } else { return 0 - ii.intValue(); } } } The method `abs` will be inlined when it is hot enough. However, this `If` can not be eliminated. * The case in the JDK-8261137 is just like this case: public class MyBenchmark { static int[] data = new int[10000]; static { for(int i = 0; i < data.length; ++i) { data[i] = i * 1337 % 7331; } } @Benchmark public void testMethod(Blackhole bh) { int sum = 0; for (int i = 0; i < data.length; i++) { Integer ii = Integer.valueOf(data[i]); if (i < 100000) { sum += ii.intValue(); } } bh.consume(sum); } } ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From whuang at openjdk.java.net Fri Feb 5 07:29:00 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 5 Feb 2021 07:29:00 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v2] In-Reply-To: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: > JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: > @Benchmark > public void testMethod(Blackhole bh) { > int sum = 0; > for (int i = 0; i < data.length; i++) { > Integer ii = Integer.valueOf(data[i]); > if (i < data.length) { > sum += ii.intValue(); > } > } > bh.consume(sum); > } > Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. > The uncommon_trap is generated by the optimized "if", because its condition is always true. > > We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, > and deleting the use of box: > > There is no additional fail/error(s) of jtreg after this patch. Wang Huang has updated the pull request incrementally with one additional commit since the last revision: refactor codes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2401/files - new: https://git.openjdk.java.net/jdk/pull/2401/files/73be94ce..4dfee52a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2401&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2401&range=00-01 Stats: 91 lines in 1 file changed: 47 ins; 43 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2401.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2401/head:pull/2401 PR: https://git.openjdk.java.net/jdk/pull/2401 From thartmann at openjdk.java.net Fri Feb 5 07:41:46 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 5 Feb 2021 07:41:46 GMT Subject: RFR: 8261022: Fix incorrect result of Math.abs() with char type In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 02:29:33 GMT, Pengfei Li wrote: > Math.abs() with char type may return incorrect result after C2 superword optimization. It can be reproduced by below Java code and commands. > > public class Bug { > private static int SIZE = 60000; > private static char[] a = new char[SIZE]; > private static char[] b = new char[SIZE]; > > public static void main(String[] args) { > for (int i = 0; i < SIZE; i++) { > a[i] = b[i] = (char) i; > } > for (int i = 0; i < SIZE; i++) { > a[i] = (char) Math.abs(a[i]); > } > for (int i = 0; i < SIZE; i++) { > if (a[i] != b[i]) { > throw new RuntimeException("Broken!"); > } > } > System.out.println("OK"); > } > } > > // $ java -Xint Bug > // OK > > // $ java -Xcomp -XX:-TieredCompilation Bug > // Exception in thread "main" java.lang.RuntimeException: Broken! > // at Bug.main(Bug.java:15) > > In Java, 'char' is a 16-bit unsigned integer type and the abs() method should always return the value of its input. But with C2 vectorization, the sign bit of the 16-bit value is cleared because it's regarded as a signed value. > > Root cause is that we get an imprecise vector element type for AbsINode from SuperWord::compute_vector_element_type(). In any Java arithmetic operation, operands of small integer types (boolean, byte, char & short) should be promoted to int first. As vector elements of small types don't have upper bits of int, for RShiftI or AbsI operations, the compiler has to know the precise signedness info of the 1st operand. These operations shouldn't be vectorized if the signedness info is imprecise. > > In code SuperWord::compute_vector_element_type(), we have some special handling for right shift. It limited the vectorization of small integer right shift to operations only after loads. The reason is that in the C2 compiler, only LoadNode has precise signedness info of its value. When JDK-8222074 enabled abs vectorization, it didn't involve AbsI operation into the special handling and thus introduced this bug. This patch just does the fix at this point. > > Tested hotspot::hotspot_all_no_apps, jdk::jdk_core and langtools::tier1, no new failure is found. Also created a new jtreg with this fix. Looks good to me. src/hotspot/share/opto/superword.cpp line 3235: > 3233: vt = velt_type(load); > 3234: } else if (in->Opcode() != Op_LShiftI) { > 3235: // Widen type to int to avoid the creation of vector nodes. Note `in->Opcode()` can be replaced by `op` ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2419 From pli at openjdk.java.net Fri Feb 5 08:26:56 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Fri, 5 Feb 2021 08:26:56 GMT Subject: RFR: 8261022: Fix incorrect result of Math.abs() with char type [v2] In-Reply-To: References: Message-ID: > Math.abs() with char type may return incorrect result after C2 superword optimization. It can be reproduced by below Java code and commands. > > public class Bug { > private static int SIZE = 60000; > private static char[] a = new char[SIZE]; > private static char[] b = new char[SIZE]; > > public static void main(String[] args) { > for (int i = 0; i < SIZE; i++) { > a[i] = b[i] = (char) i; > } > for (int i = 0; i < SIZE; i++) { > a[i] = (char) Math.abs(a[i]); > } > for (int i = 0; i < SIZE; i++) { > if (a[i] != b[i]) { > throw new RuntimeException("Broken!"); > } > } > System.out.println("OK"); > } > } > > // $ java -Xint Bug > // OK > > // $ java -Xcomp -XX:-TieredCompilation Bug > // Exception in thread "main" java.lang.RuntimeException: Broken! > // at Bug.main(Bug.java:15) > > In Java, 'char' is a 16-bit unsigned integer type and the abs() method should always return the value of its input. But with C2 vectorization, the sign bit of the 16-bit value is cleared because it's regarded as a signed value. > > Root cause is that we get an imprecise vector element type for AbsINode from SuperWord::compute_vector_element_type(). In any Java arithmetic operation, operands of small integer types (boolean, byte, char & short) should be promoted to int first. As vector elements of small types don't have upper bits of int, for RShiftI or AbsI operations, the compiler has to know the precise signedness info of the 1st operand. These operations shouldn't be vectorized if the signedness info is imprecise. > > In code SuperWord::compute_vector_element_type(), we have some special handling for right shift. It limited the vectorization of small integer right shift to operations only after loads. The reason is that in the C2 compiler, only LoadNode has precise signedness info of its value. When JDK-8222074 enabled abs vectorization, it didn't involve AbsI operation into the special handling and thus introduced this bug. This patch just does the fix at this point. > > Tested hotspot::hotspot_all_no_apps, jdk::jdk_core and langtools::tier1, no new failure is found. Also created a new jtreg with this fix. Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2419/files - new: https://git.openjdk.java.net/jdk/pull/2419/files/7ec48429..cf5659b1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2419&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2419&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2419.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2419/head:pull/2419 PR: https://git.openjdk.java.net/jdk/pull/2419 From pli at openjdk.java.net Fri Feb 5 08:26:57 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Fri, 5 Feb 2021 08:26:57 GMT Subject: RFR: 8261022: Fix incorrect result of Math.abs() with char type [v2] In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 07:32:38 GMT, Tobias Hartmann wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/share/opto/superword.cpp line 3235: > >> 3233: vt = velt_type(load); >> 3234: } else if (in->Opcode() != Op_LShiftI) { >> 3235: // Widen type to int to avoid the creation of vector nodes. Note > > `in->Opcode()` can be replaced by `op` Fixed, thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/2419 From neliasso at openjdk.java.net Fri Feb 5 08:32:43 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 5 Feb 2021 08:32:43 GMT Subject: RFR: 8261022: Fix incorrect result of Math.abs() with char type [v2] In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 08:26:56 GMT, Pengfei Li wrote: >> Math.abs() with char type may return incorrect result after C2 superword optimization. It can be reproduced by below Java code and commands. >> >> public class Bug { >> private static int SIZE = 60000; >> private static char[] a = new char[SIZE]; >> private static char[] b = new char[SIZE]; >> >> public static void main(String[] args) { >> for (int i = 0; i < SIZE; i++) { >> a[i] = b[i] = (char) i; >> } >> for (int i = 0; i < SIZE; i++) { >> a[i] = (char) Math.abs(a[i]); >> } >> for (int i = 0; i < SIZE; i++) { >> if (a[i] != b[i]) { >> throw new RuntimeException("Broken!"); >> } >> } >> System.out.println("OK"); >> } >> } >> >> // $ java -Xint Bug >> // OK >> >> // $ java -Xcomp -XX:-TieredCompilation Bug >> // Exception in thread "main" java.lang.RuntimeException: Broken! >> // at Bug.main(Bug.java:15) >> >> In Java, 'char' is a 16-bit unsigned integer type and the abs() method should always return the value of its input. But with C2 vectorization, the sign bit of the 16-bit value is cleared because it's regarded as a signed value. >> >> Root cause is that we get an imprecise vector element type for AbsINode from SuperWord::compute_vector_element_type(). In any Java arithmetic operation, operands of small integer types (boolean, byte, char & short) should be promoted to int first. As vector elements of small types don't have upper bits of int, for RShiftI or AbsI operations, the compiler has to know the precise signedness info of the 1st operand. These operations shouldn't be vectorized if the signedness info is imprecise. >> >> In code SuperWord::compute_vector_element_type(), we have some special handling for right shift. It limited the vectorization of small integer right shift to operations only after loads. The reason is that in the C2 compiler, only LoadNode has precise signedness info of its value. When JDK-8222074 enabled abs vectorization, it didn't involve AbsI operation into the special handling and thus introduced this bug. This patch just does the fix at this point. >> >> Tested hotspot::hotspot_all_no_apps, jdk::jdk_core and langtools::tier1, no new failure is found. Also created a new jtreg with this fix. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2419 From linuxhippy at gmail.com Fri Feb 5 08:39:13 2021 From: linuxhippy at gmail.com (Clemens Eisserer) Date: Fri, 5 Feb 2021 09:39:13 +0100 Subject: Frequent c1 compiler errors running IntelliJ with Openjdk 15 (JavaCallWrapper::JavaCallWrapper - failed: cannot make java calls from the native compiler) In-Reply-To: <1632ce78-f490-d586-fe47-c254561e6cd8@oracle.com> References: <1632ce78-f490-d586-fe47-c254561e6cd8@oracle.com> Message-ID: Hi Vladimir, It does look very similar to JDK-8217765 [1], but the bug was fixed long > time ago. > Please, file new bug. > I've just filed bug 9068933 - after I was able to reproduce the crash also with the official 15.0.2 openjdk builds. Thanks & best regards, Clemens From roland at openjdk.java.net Fri Feb 5 09:36:58 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 5 Feb 2021 09:36:58 GMT Subject: RFR: 8260296: SA's dumpreplaydata fails [v5] In-Reply-To: References: Message-ID: > I noticed that the SA's dumpreplaydata command fails with: > > java.lang.AssertionError: CLHSDB wasn't run successfully: Opening core file, please wait... > hsdb> Exception in thread "main" java.lang.InternalError: ciMetadata does not appear to be polymorphic > > with a simple test program. This happens because the SA can't find the > vtable symbol for ciMetadata (build produced by gcc 9.2.1). AFAIU, > there's nothing in our build system that hides that symbol. I had to > move one method's definition from the header file to the cpp file for > the symbol to be visible again. > > We have a test that checks dumpreplaydata but it doesn't catch that > problem. The test produces a replay file from a core file with the SA > by running a simple test with -Xcomp and CICrash=1. So the replay data > has very little or no profile data (which is what causes the problem > above). I propose running a slightly more complicated test method and > crashing after the method has had time to run for long enough to > collect profile data. > > The other shortcoming of the test is that it doesn't look at the > content of the replay file. It only warns if they differ. The replay > file produced by the VM and the one produced by the SA should be > identical (except for comment lines). So I propose we check that. > > Finally, I can't run that test on my system because core files are > handled by systemd (I'm running some recent version of fedora). I > suppose, the system can be configured differently but having the test > work out the box is nice. I extended the test case to handle that. > > With the improved test, there are a few differences between the VM and > SA replay files caused by VM changes that were not mirrored in the > SA. I fixed those. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: copyrights ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2195/files - new: https://git.openjdk.java.net/jdk/pull/2195/files/da2fe2be..150ea54c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2195&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2195&range=03-04 Stats: 18 lines in 18 files changed: 0 ins; 0 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/2195.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2195/head:pull/2195 PR: https://git.openjdk.java.net/jdk/pull/2195 From roland at openjdk.java.net Fri Feb 5 09:37:02 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 5 Feb 2021 09:37:02 GMT Subject: RFR: 8260296: SA's dumpreplaydata fails [v4] In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 18:25:45 GMT, Igor Ignatyev wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - pid in OutputAnalyzer >> - pid in OutputAnalyzer >> - Merge branch 'master' into JDK-8260296 >> - convert all tests >> - Merge branch 'master' into JDK-8260296 >> - use CoreUtils >> - whitespaces >> - SA fixes >> - VM fix >> - test > > LGTM, you still need to update the copyright years as @plummercj [mentioned](https://github.com/openjdk/jdk/pull/2195#issuecomment-768115732). @iignatev thanks for the review. I updated the copyrights. ------------- PR: https://git.openjdk.java.net/jdk/pull/2195 From roland at openjdk.java.net Fri Feb 5 09:37:03 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 5 Feb 2021 09:37:03 GMT Subject: Integrated: 8260296: SA's dumpreplaydata fails In-Reply-To: References: Message-ID: On Fri, 22 Jan 2021 14:21:52 GMT, Roland Westrelin wrote: > I noticed that the SA's dumpreplaydata command fails with: > > java.lang.AssertionError: CLHSDB wasn't run successfully: Opening core file, please wait... > hsdb> Exception in thread "main" java.lang.InternalError: ciMetadata does not appear to be polymorphic > > with a simple test program. This happens because the SA can't find the > vtable symbol for ciMetadata (build produced by gcc 9.2.1). AFAIU, > there's nothing in our build system that hides that symbol. I had to > move one method's definition from the header file to the cpp file for > the symbol to be visible again. > > We have a test that checks dumpreplaydata but it doesn't catch that > problem. The test produces a replay file from a core file with the SA > by running a simple test with -Xcomp and CICrash=1. So the replay data > has very little or no profile data (which is what causes the problem > above). I propose running a slightly more complicated test method and > crashing after the method has had time to run for long enough to > collect profile data. > > The other shortcoming of the test is that it doesn't look at the > content of the replay file. It only warns if they differ. The replay > file produced by the VM and the one produced by the SA should be > identical (except for comment lines). So I propose we check that. > > Finally, I can't run that test on my system because core files are > handled by systemd (I'm running some recent version of fedora). I > suppose, the system can be configured differently but having the test > work out the box is nice. I extended the test case to handle that. > > With the improved test, there are a few differences between the VM and > SA replay files caused by VM changes that were not mirrored in the > SA. I fixed those. This pull request has now been integrated. Changeset: 3495febf Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/3495febf Stats: 228 lines in 20 files changed: 109 ins; 55 del; 64 mod 8260296: SA's dumpreplaydata fails Reviewed-by: kvn, cjplummer, iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/2195 From eirbjo at gmail.com Fri Feb 5 09:51:33 2021 From: eirbjo at gmail.com (=?UTF-8?B?RWlyaWsgQmrDuHJzbsO4cw==?=) Date: Fri, 5 Feb 2021 10:51:33 +0100 Subject: C1 crash in LinearScan::eliminate_spill_moves Message-ID: Hi, While developing a Java agent which does some instrumentation, I'm observing the following C1 compilation crash quite reliably on 11, 15 and 17: Current CompileTask: C1: 1468 434 ! 3 org.jaxen.saxpath.base.Verifier::isXMLLetter (7201 bytes) Stack: [0x000070000f92d000,0x000070000fa2d000], sp=0x000070000fa2c3e0, free space=1020k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x1fb1d0] LinearScan::eliminate_spill_moves()+0x230 V [libjvm.dylib+0x203ae0] LinearScan::do_linear_scan()+0xc0 V [libjvm.dylib+0x197093] Compilation::emit_lir()+0x213 V [libjvm.dylib+0x197dee] Compilation::compile_java_method()+0x29e V [libjvm.dylib+0x1980cc] Compilation::compile_method()+0x11c V [libjvm.dylib+0x1984ee] Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, DirectiveSet*)+0x22e V [libjvm.dylib+0x199bde] Compiler::compile_method(ciEnv*, ciMethod*, int, DirectiveSet*)+0x5e V [libjvm.dylib+0x2d8292] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x5b2 V [libjvm.dylib+0x2d7bb2] CompileBroker::compiler_thread_loop()+0x4c2 With a Java 17 fastdebug build, I observe this assertion fail: V [libjvm.dylib+0x123e8dd] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x6cd V [libjvm.dylib+0x123eefb] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x3b V [libjvm.dylib+0x62a7ad] report_vm_error(char const*, int, char const*, char const*, ...)+0xdd V [libjvm.dylib+0x42c132] LIR_OprFact::virtual_register(int, BasicType)+0x132 V [libjvm.dylib+0x4692a7] MoveResolver::insert_move(Interval*, Interval*)+0x1e7 V [libjvm.dylib+0x469800] MoveResolver::resolve_mappings()+0x250 V [libjvm.dylib+0x469f02] MoveResolver::move_insert_position(LIR_List*, int)+0x72 V [libjvm.dylib+0x46d40b] LinearScanWalker::insert_move(int, Interval*, Interval*)+0x26b V [libjvm.dylib+0x470ba1] LinearScanWalker::activate_current()+0x371 V [libjvm.dylib+0x46c682] IntervalWalker::walk_to(int)+0xe2 V [libjvm.dylib+0x45963d] LinearScan::allocate_registers()+0x4ad V [libjvm.dylib+0x46298d] LinearScan::do_linear_scan()+0x46d V [libjvm.dylib+0x3c2630] Compilation::emit_lir()+0x150 V [libjvm.dylib+0x3c3694] Compilation::compile_java_method()+0x344 Some context: As seen in the Java source file [1] for the class, the uninstrumented method is quite large and has an unusual number of returns per instruction. The agent is basically a code coverage instrumenter which inserts a local variable per line of code in the beginning of the method, increments the count on each line number and reports the total counts by calling methods in a catch block. To reduce the amount of instrumented code, the agent also replaces *RETURN instructions with GOTOs to a common target where the count reporting happens. The catch handler also jumps to this target. If I limit the number of code lines which are allowed to be instrumented, the compilation no longer crashes. So seems to be related to code size / complexity somehow. I can provide core files on request. Cheers, Eirik. [1] https://github.com/jenkinsci/jaxen/blob/V_1_1_6_Final/src/java/main/org/jaxen/saxpath/base/Verifier.java#L95 From eirbjo at gmail.com Fri Feb 5 10:16:51 2021 From: eirbjo at gmail.com (=?UTF-8?B?RWlyaWsgQmrDuHJzbsO4cw==?=) Date: Fri, 5 Feb 2021 11:16:51 +0100 Subject: C1 crash in LinearScan::eliminate_spill_moves In-Reply-To: References: Message-ID: > > > With a Java 17 fastdebug build, I observe this assertion fail: > > V [libjvm.dylib+0x123e8dd] VMError::report_and_die(int, char const*, > char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char > const*, int, unsigned long)+0x6cd > V [libjvm.dylib+0x123eefb] VMError::report_and_die(Thread*, void*, char > const*, int, char const*, char const*, __va_list_tag*)+0x3b > V [libjvm.dylib+0x62a7ad] report_vm_error(char const*, int, char const*, > char const*, ...)+0xdd > V [libjvm.dylib+0x42c132] LIR_OprFact::virtual_register(int, > BasicType)+0x132 > V [libjvm.dylib+0x4692a7] MoveResolver::insert_move(Interval*, > Interval*)+0x1e7 > V [libjvm.dylib+0x469800] MoveResolver::resolve_mappings()+0x250 > V [libjvm.dylib+0x469f02] MoveResolver::move_insert_position(LIR_List*, > int)+0x72 > V [libjvm.dylib+0x46d40b] LinearScanWalker::insert_move(int, Interval*, > Interval*)+0x26b > V [libjvm.dylib+0x470ba1] LinearScanWalker::activate_current()+0x371 > V [libjvm.dylib+0x46c682] IntervalWalker::walk_to(int)+0xe2 > V [libjvm.dylib+0x45963d] LinearScan::allocate_registers()+0x4ad > V [libjvm.dylib+0x46298d] LinearScan::do_linear_scan()+0x46d > V [libjvm.dylib+0x3c2630] Compilation::emit_lir()+0x150 > V [libjvm.dylib+0x3c3694] Compilation::compile_java_method()+0x344 > > Forgot to include the actual assertion failure message: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error ([..]/jdk/src/hotspot/share/c1/c1_LIR.hpp:732), pid=5366, tid=23811 # assert(res->vreg_number() == index) failed: conversion check From vlivanov at openjdk.java.net Fri Feb 5 10:32:41 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 5 Feb 2021 10:32:41 GMT Subject: RFR: 8261152: Refine the compiler/vectorapi/VectorRebracket128Test.java test In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 04:42:30 GMT, ?? wrote: > Refine test VectorRebracket128Test.java as discussed here https://github.com/openjdk/jdk16/pull/139#discussion_r567796847 > > 1, Explicit trigger gc in the test > 2, Remove redundant imports > 3, Remove weird java options test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java line 63: > 61: while (true) { > 62: try { > 63: System.gc(); Please, give the following options a try: `-XX:ZCollectionInterval=0.01 -XX:ZFragmentationLimit=0`. According to ZGC folks, it should force continuous GC cycles w/ ZGC. ------------- PR: https://git.openjdk.java.net/jdk/pull/2422 From vlivanov at openjdk.java.net Fri Feb 5 11:04:40 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 5 Feb 2021 11:04:40 GMT Subject: RFR: 8241502: Migrate x86_64.ad to MacroAssembler In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 03:15:15 GMT, John Tortugo wrote: > Can you please take a look whether these changes are going in the direction expected or not? Yes, the patch is perfectly aligned with what JDK-8241502 proposes. Thanks for taking care of it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2420 From tobias.hartmann at oracle.com Fri Feb 5 12:57:49 2021 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 5 Feb 2021 13:57:49 +0100 Subject: C1 crash in LinearScan::eliminate_spill_moves In-Reply-To: References: Message-ID: <00626958-1740-ff4b-9520-51b59485a27b@oracle.com> Hi Eirik, thanks for reporting this issue. Could you provide the instrumented .class file and the replay_pid*.log file that should have been generated by the crashing VM? Thanks, Tobias On 05.02.21 10:51, Eirik Bj?rsn?s wrote: > Hi, > > While developing a Java agent which does some instrumentation, I'm > observing the following C1 compilation crash quite reliably on 11, 15 and > 17: > > Current CompileTask: > C1: 1468 434 ! 3 org.jaxen.saxpath.base.Verifier::isXMLLetter > (7201 bytes) > > Stack: [0x000070000f92d000,0x000070000fa2d000], sp=0x000070000fa2c3e0, > free space=1020k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > V [libjvm.dylib+0x1fb1d0] LinearScan::eliminate_spill_moves()+0x230 > V [libjvm.dylib+0x203ae0] LinearScan::do_linear_scan()+0xc0 > V [libjvm.dylib+0x197093] Compilation::emit_lir()+0x213 > V [libjvm.dylib+0x197dee] Compilation::compile_java_method()+0x29e > V [libjvm.dylib+0x1980cc] Compilation::compile_method()+0x11c > V [libjvm.dylib+0x1984ee] Compilation::Compilation(AbstractCompiler*, > ciEnv*, ciMethod*, int, BufferBlob*, DirectiveSet*)+0x22e > V [libjvm.dylib+0x199bde] Compiler::compile_method(ciEnv*, ciMethod*, > int, DirectiveSet*)+0x5e > V [libjvm.dylib+0x2d8292] > CompileBroker::invoke_compiler_on_method(CompileTask*)+0x5b2 > V [libjvm.dylib+0x2d7bb2] CompileBroker::compiler_thread_loop()+0x4c2 > > With a Java 17 fastdebug build, I observe this assertion fail: > > V [libjvm.dylib+0x123e8dd] VMError::report_and_die(int, char const*, char > const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, > int, unsigned long)+0x6cd > V [libjvm.dylib+0x123eefb] VMError::report_and_die(Thread*, void*, char > const*, int, char const*, char const*, __va_list_tag*)+0x3b > V [libjvm.dylib+0x62a7ad] report_vm_error(char const*, int, char const*, > char const*, ...)+0xdd > V [libjvm.dylib+0x42c132] LIR_OprFact::virtual_register(int, > BasicType)+0x132 > V [libjvm.dylib+0x4692a7] MoveResolver::insert_move(Interval*, > Interval*)+0x1e7 > V [libjvm.dylib+0x469800] MoveResolver::resolve_mappings()+0x250 > V [libjvm.dylib+0x469f02] MoveResolver::move_insert_position(LIR_List*, > int)+0x72 > V [libjvm.dylib+0x46d40b] LinearScanWalker::insert_move(int, Interval*, > Interval*)+0x26b > V [libjvm.dylib+0x470ba1] LinearScanWalker::activate_current()+0x371 > V [libjvm.dylib+0x46c682] IntervalWalker::walk_to(int)+0xe2 > V [libjvm.dylib+0x45963d] LinearScan::allocate_registers()+0x4ad > V [libjvm.dylib+0x46298d] LinearScan::do_linear_scan()+0x46d > V [libjvm.dylib+0x3c2630] Compilation::emit_lir()+0x150 > V [libjvm.dylib+0x3c3694] Compilation::compile_java_method()+0x344 > > > Some context: > > As seen in the Java source file [1] for the class, the uninstrumented > method is quite large and has an unusual number of returns per instruction. > > The agent is basically a code coverage instrumenter which inserts a local > variable per line of code in the beginning of the method, increments the > count on each line number and reports the total counts by calling methods > in a catch block. > > To reduce the amount of instrumented code, the agent also replaces *RETURN > instructions with GOTOs to a common target where the count reporting > happens. The catch handler also jumps to this target. > > If I limit the number of code lines which are allowed to be instrumented, > the compilation no longer crashes. So seems to be related to code size / > complexity somehow. > > I can provide core files on request. > > Cheers, > Eirik. > > [1] > https://github.com/jenkinsci/jaxen/blob/V_1_1_6_Final/src/java/main/org/jaxen/saxpath/base/Verifier.java#L95 > From eirbjo at gmail.com Fri Feb 5 13:28:41 2021 From: eirbjo at gmail.com (=?UTF-8?B?RWlyaWsgQmrDuHJzbsO4cw==?=) Date: Fri, 5 Feb 2021 14:28:41 +0100 Subject: C1 crash in LinearScan::eliminate_spill_moves In-Reply-To: <00626958-1740-ff4b-9520-51b59485a27b@oracle.com> References: <00626958-1740-ff4b-9520-51b59485a27b@oracle.com> Message-ID: Tobias, Thanks for looking into this! I've sent you the instrumented class file and replay file off-list. I'm also trying to create a more minimal reproducer that I can share, but having a hard time getting that to crash in the same way. Thanks, Eirik. On Fri, Feb 5, 2021 at 1:58 PM Tobias Hartmann wrote: > Hi Eirik, > > thanks for reporting this issue. Could you provide the instrumented .class > file and the > replay_pid*.log file that should have been generated by the crashing VM? > > Thanks, > Tobias > > On 05.02.21 10:51, Eirik Bj?rsn?s wrote: > > Hi, > > > > While developing a Java agent which does some instrumentation, I'm > > observing the following C1 compilation crash quite reliably on 11, 15 and > > 17: > > > > Current CompileTask: > > C1: 1468 434 ! 3 > org.jaxen.saxpath.base.Verifier::isXMLLetter > > (7201 bytes) > > > > Stack: [0x000070000f92d000,0x000070000fa2d000], sp=0x000070000fa2c3e0, > > free space=1020k > > Native frames: (J=compiled Java code, A=aot compiled Java code, > > j=interpreted, Vv=VM code, C=native code) > > V [libjvm.dylib+0x1fb1d0] LinearScan::eliminate_spill_moves()+0x230 > > V [libjvm.dylib+0x203ae0] LinearScan::do_linear_scan()+0xc0 > > V [libjvm.dylib+0x197093] Compilation::emit_lir()+0x213 > > V [libjvm.dylib+0x197dee] Compilation::compile_java_method()+0x29e > > V [libjvm.dylib+0x1980cc] Compilation::compile_method()+0x11c > > V [libjvm.dylib+0x1984ee] Compilation::Compilation(AbstractCompiler*, > > ciEnv*, ciMethod*, int, BufferBlob*, DirectiveSet*)+0x22e > > V [libjvm.dylib+0x199bde] Compiler::compile_method(ciEnv*, ciMethod*, > > int, DirectiveSet*)+0x5e > > V [libjvm.dylib+0x2d8292] > > CompileBroker::invoke_compiler_on_method(CompileTask*)+0x5b2 > > V [libjvm.dylib+0x2d7bb2] CompileBroker::compiler_thread_loop()+0x4c2 > > > > With a Java 17 fastdebug build, I observe this assertion fail: > > > > V [libjvm.dylib+0x123e8dd] VMError::report_and_die(int, char const*, > char > > const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char > const*, > > int, unsigned long)+0x6cd > > V [libjvm.dylib+0x123eefb] VMError::report_and_die(Thread*, void*, char > > const*, int, char const*, char const*, __va_list_tag*)+0x3b > > V [libjvm.dylib+0x62a7ad] report_vm_error(char const*, int, char > const*, > > char const*, ...)+0xdd > > V [libjvm.dylib+0x42c132] LIR_OprFact::virtual_register(int, > > BasicType)+0x132 > > V [libjvm.dylib+0x4692a7] MoveResolver::insert_move(Interval*, > > Interval*)+0x1e7 > > V [libjvm.dylib+0x469800] MoveResolver::resolve_mappings()+0x250 > > V [libjvm.dylib+0x469f02] MoveResolver::move_insert_position(LIR_List*, > > int)+0x72 > > V [libjvm.dylib+0x46d40b] LinearScanWalker::insert_move(int, Interval*, > > Interval*)+0x26b > > V [libjvm.dylib+0x470ba1] LinearScanWalker::activate_current()+0x371 > > V [libjvm.dylib+0x46c682] IntervalWalker::walk_to(int)+0xe2 > > V [libjvm.dylib+0x45963d] LinearScan::allocate_registers()+0x4ad > > V [libjvm.dylib+0x46298d] LinearScan::do_linear_scan()+0x46d > > V [libjvm.dylib+0x3c2630] Compilation::emit_lir()+0x150 > > V [libjvm.dylib+0x3c3694] Compilation::compile_java_method()+0x344 > > > > > > Some context: > > > > As seen in the Java source file [1] for the class, the uninstrumented > > method is quite large and has an unusual number of returns per > instruction. > > > > The agent is basically a code coverage instrumenter which inserts a local > > variable per line of code in the beginning of the method, increments the > > count on each line number and reports the total counts by calling methods > > in a catch block. > > > > To reduce the amount of instrumented code, the agent also replaces > *RETURN > > instructions with GOTOs to a common target where the count reporting > > happens. The catch handler also jumps to this target. > > > > If I limit the number of code lines which are allowed to be instrumented, > > the compilation no longer crashes. So seems to be related to code size / > > complexity somehow. > > > > I can provide core files on request. > > > > Cheers, > > Eirik. > > > > [1] > > > https://github.com/jenkinsci/jaxen/blob/V_1_1_6_Final/src/java/main/org/jaxen/saxpath/base/Verifier.java#L95 > > > From tobias.hartmann at oracle.com Fri Feb 5 13:33:53 2021 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 5 Feb 2021 14:33:53 +0100 Subject: [External] : Re: C1 crash in LinearScan::eliminate_spill_moves In-Reply-To: References: <00626958-1740-ff4b-9520-51b59485a27b@oracle.com> Message-ID: Hi Eirik, On 05.02.21 14:28, Eirik Bj?rsn?s wrote: > Thanks for looking into this! I've sent you the instrumented class file and replay file off-list. Perfect, thanks a lot! I can reproduce the issue and filed https://bugs.openjdk.java.net/browse/JDK-8261235 to investigate. Best regards, Tobias From eirbjo at gmail.com Fri Feb 5 15:20:17 2021 From: eirbjo at gmail.com (=?UTF-8?B?RWlyaWsgQmrDuHJzbsO4cw==?=) Date: Fri, 5 Feb 2021 16:20:17 +0100 Subject: C1 crash in LinearScan::eliminate_spill_moves In-Reply-To: References: <00626958-1740-ff4b-9520-51b59485a27b@oracle.com> Message-ID: > > I'm also trying to create a more minimal reproducer that I can share, but > having a hard time getting that to crash in the same way. > Here's a self contained Maven project with a minimal (or at least "reduced") reproducer for JDK-8261235: https://github.com/eirbjo/jdk-8261235-reproducer Here's the main method: https://github.com/eirbjo/jdk-8261235-reproducer/blob/main/src/main/java/com/github/eirbjo/JDK8261235Reproducer.java Tobias: This produces an instrumented class file with a lot less cruft than the one I sent you so could possibly simplify analysis. Thanks, Eirik. From vlivanov at openjdk.java.net Fri Feb 5 17:35:52 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 5 Feb 2021 17:35:52 GMT Subject: RFR: 8259430: C2: assert(in_vt->length() == out_vt->length()) failed: mismatch on number of elements Message-ID: Another problem caused by pathological cases (in effectively dead code): `VectorUnboxNode::Ideal()/Value()` ignore cast nodes (even the ones carrying control dependency) to reveal `VectorBox` and sometimes it exposes type mismatches between box/unbox operations which are impossible in practice. Proposed fix turns the assert into a runtime check to ignore problematic IR shape. ------------- Commit messages: - 8259430: C2: assert(in_vt->length() == out_vt->length()) failed: mismatch on number of elements Changes: https://git.openjdk.java.net/jdk/pull/2353/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2353&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259430 Stats: 23 lines in 2 files changed: 6 ins; 2 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/2353.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2353/head:pull/2353 PR: https://git.openjdk.java.net/jdk/pull/2353 From kvn at openjdk.java.net Fri Feb 5 17:49:42 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 5 Feb 2021 17:49:42 GMT Subject: RFR: 8241502: Migrate x86_64.ad to MacroAssembler In-Reply-To: References: Message-ID: <0azhJ4pD5Tq_lkpPtYMpQBjokflcSQEdWP2Rz9HBm6k=.c3ece6fd-1ae7-49ea-a6eb-ec88a9fbd54d@github.com> On Fri, 5 Feb 2021 11:02:04 GMT, Vladimir Ivanov wrote: >> Relates to: https://bugs.openjdk.java.net/browse/JDK-8241502 >> Tested on: Linux tier1, 2 and 3 >> >> Can you please take a look whether these changes are going in the direction expected or not? If it is, I'll continue working on the `JDK-8241502` but I'd like to split it in a few PRs since it's a lot of changes. > >> Can you please take a look whether these changes are going in the direction expected or not? > > Yes, the patch is perfectly aligned with what JDK-8241502 proposes. Thanks for taking care of it. I agree. We wanted to do that for long time. ------------- PR: https://git.openjdk.java.net/jdk/pull/2420 From kvn at openjdk.java.net Fri Feb 5 17:56:48 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 5 Feb 2021 17:56:48 GMT Subject: RFR: 8259430: C2: assert(in_vt->length() == out_vt->length()) failed: mismatch on number of elements In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 12:57:11 GMT, Vladimir Ivanov wrote: > Another problem caused by pathological cases (in effectively dead code): `VectorUnboxNode::Ideal()/Value()` ignore cast nodes (even the ones carrying control dependency) to reveal `VectorBox` and sometimes it exposes type mismatches between box/unbox operations which are impossible in practice. > > Proposed fix turns the assert into a runtime check to ignore problematic IR shape. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2353 From vlivanov at openjdk.java.net Fri Feb 5 18:08:52 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 5 Feb 2021 18:08:52 GMT Subject: RFR: 8261250: Dependencies: Remove unused dependency types Message-ID: <2-SBc7bcjQnndYbBWgQFNs4nY_OjM-5_E5f3g2-OEiA=.d235e241-adc8-4f51-8bf9-66915f068a95@github.com> Remove support of unused dependency types from Dependencies. Testing: - [x] hs-precheckin-comp, hs-tier1, hs-tier2. ------------- Commit messages: - Formatting - Formatting cleanups - Remove assert_common_3 - Remove find_exclusive_concrete_subtypes - Remove concrete_with_no_concrete_subtype - Remove abstract_with_no_concrete_subtype - Remove abstract_with_exclusive_concrete_subtype - Remove exclusive_concrete_methods Changes: https://git.openjdk.java.net/jdk/pull/2431/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2431&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261250 Stats: 253 lines in 2 files changed: 1 ins; 238 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/2431.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2431/head:pull/2431 PR: https://git.openjdk.java.net/jdk/pull/2431 From iklam at openjdk.java.net Fri Feb 5 18:46:43 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 5 Feb 2021 18:46:43 GMT Subject: RFR: 8241502: Migrate x86_64.ad to MacroAssembler In-Reply-To: <0azhJ4pD5Tq_lkpPtYMpQBjokflcSQEdWP2Rz9HBm6k=.c3ece6fd-1ae7-49ea-a6eb-ec88a9fbd54d@github.com> References: <0azhJ4pD5Tq_lkpPtYMpQBjokflcSQEdWP2Rz9HBm6k=.c3ece6fd-1ae7-49ea-a6eb-ec88a9fbd54d@github.com> Message-ID: On Fri, 5 Feb 2021 17:47:07 GMT, Vladimir Kozlov wrote: >>> Can you please take a look whether these changes are going in the direction expected or not? >> >> Yes, the patch is perfectly aligned with what JDK-8241502 proposes. Thanks for taking care of it. > > I agree. We wanted to do that for long time. I am curious if the x86_64.o file changes in any significant way (speed of size). ------------- PR: https://git.openjdk.java.net/jdk/pull/2420 From kvn at openjdk.java.net Fri Feb 5 19:02:41 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 5 Feb 2021 19:02:41 GMT Subject: RFR: 8261250: Dependencies: Remove unused dependency types In-Reply-To: <2-SBc7bcjQnndYbBWgQFNs4nY_OjM-5_E5f3g2-OEiA=.d235e241-adc8-4f51-8bf9-66915f068a95@github.com> References: <2-SBc7bcjQnndYbBWgQFNs4nY_OjM-5_E5f3g2-OEiA=.d235e241-adc8-4f51-8bf9-66915f068a95@github.com> Message-ID: On Fri, 5 Feb 2021 17:58:06 GMT, Vladimir Ivanov wrote: > Remove support of unused dependency types from Dependencies. > > Testing: > - [x] hs-precheckin-comp, hs-tier1, hs-tier2. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2431 From ogatak at openjdk.java.net Fri Feb 5 21:21:43 2021 From: ogatak at openjdk.java.net (Kazunori Ogata) Date: Fri, 5 Feb 2021 21:21:43 GMT Subject: RFR: 8259822: [PPC64] Support the prefixed instruction format added in POWER10 [v4] In-Reply-To: References: <2CXH3zvUkBfqLrr8SPpD-x3RcSbFzw0S6NGLWgtirQ8=.0df1a0c8-3ac2-4e21-9e6a-f0b43a3ae20c@github.com> Message-ID: On Wed, 3 Feb 2021 17:35:33 GMT, Corey Ashford wrote: >> Kazunori Ogata has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed pla and psubi and adjusted spacing based on review comments > > Looks good! Thanks for the additional formatting cleanup @CoreyAshford Thank you for your review. @TheRealMDoerr Thank you for your suggestion. I think I understand the mechanism. I'll update my patch after verifing my change and running jtreg. ------------- PR: https://git.openjdk.java.net/jdk/pull/2095 From enikitin at openjdk.java.net Fri Feb 5 22:20:00 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 5 Feb 2021 22:20:00 GMT Subject: RFR: 8058176: [mlvm] Tests should tolerate exceptions caused by code cache exhaustion. Message-ID: A repetition of the #1622. 1. Normalise meth/stress/compiler/i2c_c2i/Test.java to use MultiThreadedTest framework; 2. Adjust MultiThreadedTest in order to accomodate the i2c_c2i test (add prepareThread method and logic); 3. Add ThrowableTolerance and DefaultThrowableTolerance as ways to control what Throwables are accepted; 4. Adjust MultiThreadedTest to catch Throwables and check if they are accepted; 5. Adjust individual tests to catch possible Throwables and check if they are accepted; 6. Un-problemlist the failing tests. Testing vmTestBase/vm/mlvm/meth/stress run on macos-linux-windows (30 runs each) in x64 configurations, rebased on top of latest code base. Code cache was limited `-XX:ReservedCodeCacheSize=8M` as suggested in the case. ------------- Commit messages: - 8058176: [mlvm] tests should tolerate exceptions caused by code cache exhaustion Changes: https://git.openjdk.java.net/jdk/pull/2440/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2440&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8058176 Stats: 340 lines in 12 files changed: 245 ins; 31 del; 64 mod Patch: https://git.openjdk.java.net/jdk/pull/2440.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2440/head:pull/2440 PR: https://git.openjdk.java.net/jdk/pull/2440 From jwilhelm at openjdk.java.net Sat Feb 6 00:26:03 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Sat, 6 Feb 2021 00:26:03 GMT Subject: RFR: Merge jdk16 Message-ID: Forwardport JDK 16 -> JDK 17 ------------- Commit messages: - Merge - 8260709: C2: assert(false) failed: unscheduable graph - 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=2441&range=00.0 - jdk16: https://webrevs.openjdk.java.net/?repo=jdk&pr=2441&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/2441/files Stats: 549 lines in 5 files changed: 527 ins; 16 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2441.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2441/head:pull/2441 PR: https://git.openjdk.java.net/jdk/pull/2441 From jwilhelm at openjdk.java.net Sat Feb 6 00:32:44 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Sat, 6 Feb 2021 00:32:44 GMT Subject: Integrated: Merge jdk16 In-Reply-To: References: Message-ID: On Sat, 6 Feb 2021 00:19:38 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 16 -> JDK 17 This pull request has now been integrated. Changeset: d7acfae3 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/d7acfae3 Stats: 549 lines in 5 files changed: 527 ins; 16 del; 6 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/2441 From iveresov at openjdk.java.net Sat Feb 6 01:38:48 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Sat, 6 Feb 2021 01:38:48 GMT Subject: RFR: 8261270: MakeMethodNotCompilableTest fails with -XX:TieredStopAtLevel={1, 2, 3} Message-ID: JDK-8251462 changed semantics of CompilationPolicy::highest_compile_level() and CompilationPolicy::can_be_compiled() to take into the account compiler availability due to command line flags restrictions. For example, if C2 is not available because of the TieredStopAtLevel setting, the answer to these queries will be that the highest level is 3, and that methods can't be compiled at level 4. MakeMethodNotCompilableTest.java assumes (even in the presence of the mentioned restrictions) that with tiered compilation it is always the case that both compilers can be used. This change fixes the test to take this into account. ------------- Commit messages: - Adjust the test to support new semantics Changes: https://git.openjdk.java.net/jdk/pull/2443/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2443&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261270 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2443.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2443/head:pull/2443 PR: https://git.openjdk.java.net/jdk/pull/2443 From kvn at openjdk.java.net Sat Feb 6 04:01:41 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 6 Feb 2021 04:01:41 GMT Subject: RFR: 8261270: MakeMethodNotCompilableTest fails with -XX:TieredStopAtLevel={1, 2, 3} In-Reply-To: References: Message-ID: <9Lu_O8QDeuZ0Zn8MQcMnXs0X3Kxw9IXpVqmosXlyrmo=.5d89a8aa-7e20-435f-b21c-bc16c00f49d5@github.com> On Sat, 6 Feb 2021 01:34:15 GMT, Igor Veresov wrote: > JDK-8251462 changed semantics of CompilationPolicy::highest_compile_level() and CompilationPolicy::can_be_compiled() to take into the account compiler availability due to command line flags restrictions. For example, if C2 is not available because of the TieredStopAtLevel setting, the answer to these queries will be that the highest level is 3, and that methods can't be compiled at level 4. > > MakeMethodNotCompilableTest.java assumes (even in the presence of the mentioned restrictions) that with tiered compilation it is always the case that both compilers can be used. > > This change fixes the test to take this into account. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2443 From iveresov at openjdk.java.net Sat Feb 6 20:47:00 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Sat, 6 Feb 2021 20:47:00 GMT Subject: RFR: 8261229: MethodData is not correctly initialized with TieredStopAtLevel=3 [v2] In-Reply-To: <-ReDb8Jf-EIuYNyQltIaKSF1tUPmswp6kx_mciU5Fvk=.31d8565c-26fe-4f7f-8b88-55bcc996bfca@github.com> References: <-ReDb8Jf-EIuYNyQltIaKSF1tUPmswp6kx_mciU5Fvk=.31d8565c-26fe-4f7f-8b88-55bcc996bfca@github.com> Message-ID: <-RNQ0M9cHiIX7na2DnUnhucXZ35JILV8sdMO0oAeaOI=.42b67e78-71c8-4bcf-aaa7-22889fbfbba5@github.com> > Mostly a typo in compilation mode ergonomics that selected a quick-only mode essentially when the user specified TieredStopAtLevel={1,2,3}. The quick-only mode has an optimization that eliminates parts of the MDO since they are not needed. Meanwhile, the WB API considered it a fair game to request a level 3 compile, that requires a full MDO. > > The fix corrects the original issue and also tries to be extra defensive with WB API (since it's semantics is not clearly specified) by always allocating full MDO if WB API is on. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Remove WB defence ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2444/files - new: https://git.openjdk.java.net/jdk/pull/2444/files/4086df33..c57457b2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2444&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2444&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2444.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2444/head:pull/2444 PR: https://git.openjdk.java.net/jdk/pull/2444 From iveresov at openjdk.java.net Sat Feb 6 20:47:00 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Sat, 6 Feb 2021 20:47:00 GMT Subject: RFR: 8261229: MethodData is not correctly initialized with TieredStopAtLevel=3 In-Reply-To: References: <-ReDb8Jf-EIuYNyQltIaKSF1tUPmswp6kx_mciU5Fvk=.31d8565c-26fe-4f7f-8b88-55bcc996bfca@github.com> Message-ID: On Sat, 6 Feb 2021 17:49:43 GMT, Igor Ignatyev wrote: > I don't think we should adjust the product code to behave differently just to satisfy the incorrect assumptions of WhiteBox. it also kinda defeats the purpose of WhiteBox API as we won't be able to go thru the same code path. > > -- Igor Ok, fair enough, I'll remove it. But we should probably make WB more robust in this department - it needs to be throwing exceptions in cases when the user tries to submit compilation requests for unsupported levels. ------------- PR: https://git.openjdk.java.net/jdk/pull/2444 From github.com+2249648+johntortugo at openjdk.java.net Sat Feb 6 23:56:00 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Sat, 6 Feb 2021 23:56:00 GMT Subject: RFR: 8241502: Migrate x86_64.ad to MacroAssembler [v2] In-Reply-To: References: Message-ID: > Relates to: https://bugs.openjdk.java.net/browse/JDK-8241502 > Tested on: Linux tier1, 2 and 3 > > Can you please take a look whether these changes are going in the direction expected or not? If it is, I'll continue working on the `JDK-8241502` but I'd like to split it in a few PRs since it's a lot of changes. John Tortugo has updated the pull request incrementally with one additional commit since the last revision: Second part of conversions. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2420/files - new: https://git.openjdk.java.net/jdk/pull/2420/files/8198a988..25824fde Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2420&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2420&range=00-01 Stats: 141 lines in 1 file changed: 45 ins; 0 del; 96 mod Patch: https://git.openjdk.java.net/jdk/pull/2420.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2420/head:pull/2420 PR: https://git.openjdk.java.net/jdk/pull/2420 From pli at openjdk.java.net Sun Feb 7 01:18:43 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Sun, 7 Feb 2021 01:18:43 GMT Subject: Integrated: 8261022: Fix incorrect result of Math.abs() with char type In-Reply-To: References: Message-ID: <1aUM7cENTBMCGaw2vbRyhhFRSD2vZsQ44nYpqZAiKuc=.89115373-caf7-465a-b723-50d00a0592e5@github.com> On Fri, 5 Feb 2021 02:29:33 GMT, Pengfei Li wrote: > Math.abs() with char type may return incorrect result after C2 superword optimization. It can be reproduced by below Java code and commands. > > public class Bug { > private static int SIZE = 60000; > private static char[] a = new char[SIZE]; > private static char[] b = new char[SIZE]; > > public static void main(String[] args) { > for (int i = 0; i < SIZE; i++) { > a[i] = b[i] = (char) i; > } > for (int i = 0; i < SIZE; i++) { > a[i] = (char) Math.abs(a[i]); > } > for (int i = 0; i < SIZE; i++) { > if (a[i] != b[i]) { > throw new RuntimeException("Broken!"); > } > } > System.out.println("OK"); > } > } > > // $ java -Xint Bug > // OK > > // $ java -Xcomp -XX:-TieredCompilation Bug > // Exception in thread "main" java.lang.RuntimeException: Broken! > // at Bug.main(Bug.java:15) > > In Java, 'char' is a 16-bit unsigned integer type and the abs() method should always return the value of its input. But with C2 vectorization, the sign bit of the 16-bit value is cleared because it's regarded as a signed value. > > Root cause is that we get an imprecise vector element type for AbsINode from SuperWord::compute_vector_element_type(). In any Java arithmetic operation, operands of small integer types (boolean, byte, char & short) should be promoted to int first. As vector elements of small types don't have upper bits of int, for RShiftI or AbsI operations, the compiler has to know the precise signedness info of the 1st operand. These operations shouldn't be vectorized if the signedness info is imprecise. > > In code SuperWord::compute_vector_element_type(), we have some special handling for right shift. It limited the vectorization of small integer right shift to operations only after loads. The reason is that in the C2 compiler, only LoadNode has precise signedness info of its value. When JDK-8222074 enabled abs vectorization, it didn't involve AbsI operation into the special handling and thus introduced this bug. This patch just does the fix at this point. > > Tested hotspot::hotspot_all_no_apps, jdk::jdk_core and langtools::tier1, no new failure is found. Also created a new jtreg with this fix. This pull request has now been integrated. Changeset: 7a2db858 Author: Pengfei Li URL: https://git.openjdk.java.net/jdk/commit/7a2db858 Stats: 77 lines in 2 files changed: 66 ins; 1 del; 10 mod 8261022: Fix incorrect result of Math.abs() with char type Reviewed-by: thartmann, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/2419 From iveresov at openjdk.java.net Sun Feb 7 02:27:43 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Sun, 7 Feb 2021 02:27:43 GMT Subject: Integrated: 8261270: MakeMethodNotCompilableTest fails with -XX:TieredStopAtLevel={1, 2, 3} In-Reply-To: References: Message-ID: On Sat, 6 Feb 2021 01:34:15 GMT, Igor Veresov wrote: > JDK-8251462 changed semantics of CompilationPolicy::highest_compile_level() and CompilationPolicy::can_be_compiled() to take into the account compiler availability due to command line flags restrictions. For example, if C2 is not available because of the TieredStopAtLevel setting, the answer to these queries will be that the highest level is 3, and that methods can't be compiled at level 4. > > MakeMethodNotCompilableTest.java assumes (even in the presence of the mentioned restrictions) that with tiered compilation it is always the case that both compilers can be used. > > This change fixes the test to take this into account. This pull request has now been integrated. Changeset: 0e18634b Author: Igor Veresov URL: https://git.openjdk.java.net/jdk/commit/0e18634b Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8261270: MakeMethodNotCompilableTest fails with -XX:TieredStopAtLevel={1,2,3} Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/2443 From dongbo at openjdk.java.net Sun Feb 7 07:57:57 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Sun, 7 Feb 2021 07:57:57 GMT Subject: RFR: 8258953: AArch64: move NEON instructions to aarch64_neon.ad [v2] In-Reply-To: References: Message-ID: > As discussed in [1], all NEON instructions should be moved from `aarch64.ad` to `aarch64_neon.ad`. > > In the first commit [2] of this PR, the NEON instructions are deleted from `aarch64.ad` and appended to `aarch64_neon.ad`. > I compared the generated code in `aarch64_neon.ad` with original code in `aarch64.ad`, no suspicious differences found. > The last two commits just simply move code around in `aarch64_neon.ad` to put related instructions together, i.e. `LoadStore` [3], `Reduction` [4]. > > This also supports vector length 4 for `vsraa8B_imm` and `vsrla8B_imm`, vector length 2 for `vsraa4S_imm` and `vsrla4S_imm`, fixes few typos, e.g. `vor8B`, `vsrla4S_imm`. > > [1] https://github.com/openjdk/jdk/pull/1215#issuecomment-728186803 > [2] https://github.com/dgbo/jdk/commit/40cbe99e647cdf93712edf8f77ab3b5b30ea0a95 > [3] https://github.com/dgbo/jdk/commit/695fb8f8ef009b733a8f804e791347f4bfe2572e > [4] https://github.com/dgbo/jdk/commit/e0c38aa9aaa6af9925a3821328384b1e2b2c2070 Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8258953: AArch64: move NEON instructions to aarch64_neon.ad ------------- Changes: https://git.openjdk.java.net/jdk/pull/2273/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2273&range=01 Stats: 5661 lines in 3 files changed: 3216 ins; 2435 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2273.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2273/head:pull/2273 PR: https://git.openjdk.java.net/jdk/pull/2273 From dongbo at openjdk.java.net Sun Feb 7 08:05:43 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Sun, 7 Feb 2021 08:05:43 GMT Subject: RFR: 8258953: AArch64: move NEON instructions to aarch64_neon.ad [v2] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 12:09:36 GMT, Dong Bo wrote: >> I managed to sort all the instructs and compare them with and without the patch. They are general the same except for some trailing whitespaces and typos you mentioned. > >> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_ >> >> On 1/28/21 10:40 AM, Ningsheng Jian wrote: >> >> > I see you have fixed this typo, from ushr to usra. I presume original version generates wrong code and produces wrong results for specific case? If so, do you think it deserves a separate fix, e.g. for jdk16? >> >> It does. This patch should change nothing at all, except moving >> text from A to B. >> >> -- >> Andrew Haley (he/him) >> Java Platform Lead Engineer >> Red Hat UK Ltd. >> https://keybase.io/andrewhaley >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > @nsjian @theRealAph Thank you for the comments. I'll raise a seperate PR to fix this right now. > > BTW, since Andrew says we should change nothing at all in this move, do you think we should also do the things below in separtate PRs? > 1. fix the typo of `vor8B`. > 2. supporting vector length 4 for `vsraa8B_imm` and `vsrla8B_imm`, vector length 2 for `vsraa4S_imm` and `vsrla4S_imm`. Updated. The whitespaces mentioned are addressed. The format typo fix in `vor8B` is kept, other instructions are appended to aarch64_neon.ad. ------------- PR: https://git.openjdk.java.net/jdk/pull/2273 From github.com+25214855+casparcwang at openjdk.java.net Sun Feb 7 10:01:46 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Sun, 7 Feb 2021 10:01:46 GMT Subject: RFR: 8261152: Refine the compiler/vectorapi/VectorRebracket128Test.java test In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 10:29:34 GMT, Vladimir Ivanov wrote: >> Refine test VectorRebracket128Test.java as discussed here https://github.com/openjdk/jdk16/pull/139#discussion_r567796847 >> >> 1, Explicit trigger gc in the test >> 2, Remove redundant imports >> 3, Remove weird java options > > test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java line 63: > >> 61: while (true) { >> 62: try { >> 63: System.gc(); > > Please, give the following options a try: `-XX:ZCollectionInterval=0.01 -XX:ZFragmentationLimit=0`. > According to ZGC folks, it should force continuous GC cycles w/ ZGC. The original version with option 'CICompilerCount' passed (passed means the bug is not triggered) 5 times in 100 runs, the background gc version passed 8 times in 100 runs. `ZCollectionInterval` passed 44 in 100 runs. Explicit trigger gc in the background thread and timer-based gc triggering perform the same thing, it's really strange to behave differently in triggering the bug. The reason I guess is: the load barrier missing bug will only be triggered when the object is relocated and the pointer in another object is not remapped, which means the time window is very short, different options may pose different execution path (which creates different objects, threads, etc.). ------------- PR: https://git.openjdk.java.net/jdk/pull/2422 From github.com+10482586+therealeliu at openjdk.java.net Sun Feb 7 10:08:42 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Sun, 7 Feb 2021 10:08:42 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Fri, 5 Feb 2021 07:15:55 GMT, Wang Huang wrote: > > I was wandering if we can remove the useless `if` as it's always true in this case. Do you know why this kind of `if` haven't been eliminated by GVN phase? > It is a good idea . However, C2 might not optimize some `if` for many reasons: > > * This situation can also be triggered in calling other method. For example : > > ``` > public class MyBenchmark { > > static int[] data = new int[10000]; > > static { > for(int i = 0; i < data.length; ++i) { > data[i] = 299; > } > } > > @Benchmark > public void testMethod(Blackhole bh) { > int sum = 0; > for (int i = 0; i < data.length; i++) { > Integer ii = Integer.valueOf(data[i]); > sum += abs(ii); > } > bh.consume(sum); > } > > public int abs(Integer ii) { > if (ii.intValue() > 0) { > return ii.intValue(); > } else { > return 0 - ii.intValue(); > } > } > } > ``` > > The method `abs` will be inlined when it is hot enough. However, this `If` can not be eliminated. > > * The case in the JDK-8261137 is just like this case: > > ``` > public class MyBenchmark { > > static int[] data = new int[10000]; > > static { > for(int i = 0; i < data.length; ++i) { > data[i] = i * 1337 % 7331; > } > } > > @Benchmark > public void testMethod(Blackhole bh) { > int sum = 0; > for (int i = 0; i < data.length; i++) { > Integer ii = Integer.valueOf(data[i]); > if (i < 100000) { > sum += ii.intValue(); > } > } > bh.consume(sum); > } > } > ``` Thanks for your explanation. This makes more sense to me now. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From aph at openjdk.java.net Sun Feb 7 10:58:44 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sun, 7 Feb 2021 10:58:44 GMT Subject: RFR: 8258953: AArch64: move NEON instructions to aarch64_neon.ad [v2] In-Reply-To: References: Message-ID: On Sun, 7 Feb 2021 07:57:57 GMT, Dong Bo wrote: >> As discussed in [1], all NEON instructions should be moved from `aarch64.ad` to `aarch64_neon.ad`. >> >> In the first commit [2] of this PR, the NEON instructions are deleted from `aarch64.ad` and appended to `aarch64_neon.ad`. >> I compared the generated code in `aarch64_neon.ad` with original code in `aarch64.ad`, no suspicious differences found. >> The last two commits just simply move code around in `aarch64_neon.ad` to put related instructions together, i.e. `LoadStore` [3], `Reduction` [4]. >> >> This also supports vector length 4 for `vsraa8B_imm` and `vsrla8B_imm`, vector length 2 for `vsraa4S_imm` and `vsrla4S_imm`, fixes few typos, e.g. `vor8B`, `vsrla4S_imm`. >> >> [1] https://github.com/openjdk/jdk/pull/1215#issuecomment-728186803 >> [2] https://github.com/dgbo/jdk/commit/40cbe99e647cdf93712edf8f77ab3b5b30ea0a95 >> [3] https://github.com/dgbo/jdk/commit/695fb8f8ef009b733a8f804e791347f4bfe2572e >> [4] https://github.com/dgbo/jdk/commit/e0c38aa9aaa6af9925a3821328384b1e2b2c2070 > > Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8258953: AArch64: move NEON instructions to aarch64_neon.ad That looks fine. I haven't been able to check that all this patch does is move code from aarch64.ad to aarch64_neon.ad, but I believe you. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2273 From njian at openjdk.java.net Mon Feb 8 02:08:41 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 8 Feb 2021 02:08:41 GMT Subject: RFR: 8258953: AArch64: move NEON instructions to aarch64_neon.ad [v2] In-Reply-To: References: Message-ID: On Sun, 7 Feb 2021 07:57:57 GMT, Dong Bo wrote: >> As discussed in [1], all NEON instructions should be moved from `aarch64.ad` to `aarch64_neon.ad`. >> >> In the first commit [2] of this PR, the NEON instructions are deleted from `aarch64.ad` and appended to `aarch64_neon.ad`. >> I compared the generated code in `aarch64_neon.ad` with original code in `aarch64.ad`, no suspicious differences found. >> The last two commits just simply move code around in `aarch64_neon.ad` to put related instructions together, i.e. `LoadStore` [3], `Reduction` [4]. >> >> This also supports vector length 4 for `vsraa8B_imm` and `vsrla8B_imm`, vector length 2 for `vsraa4S_imm` and `vsrla4S_imm`, fixes few typos, e.g. `vor8B`, `vsrla4S_imm`. >> >> [1] https://github.com/openjdk/jdk/pull/1215#issuecomment-728186803 >> [2] https://github.com/dgbo/jdk/commit/40cbe99e647cdf93712edf8f77ab3b5b30ea0a95 >> [3] https://github.com/dgbo/jdk/commit/695fb8f8ef009b733a8f804e791347f4bfe2572e >> [4] https://github.com/dgbo/jdk/commit/e0c38aa9aaa6af9925a3821328384b1e2b2c2070 > > Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8258953: AArch64: move NEON instructions to aarch64_neon.ad I compared all-ad-src.ad with and without the patch, and it looked good to me. ------------- Marked as reviewed by njian (Committer). PR: https://git.openjdk.java.net/jdk/pull/2273 From dongbo at openjdk.java.net Mon Feb 8 02:15:44 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 8 Feb 2021 02:15:44 GMT Subject: RFR: 8258953: AArch64: move NEON instructions to aarch64_neon.ad [v2] In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 02:05:59 GMT, Ningsheng Jian wrote: >> Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8258953: AArch64: move NEON instructions to aarch64_neon.ad > > I compared all-ad-src.ad with and without the patch, and it looked good to me. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2273 From dongbo at openjdk.java.net Mon Feb 8 02:15:45 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 8 Feb 2021 02:15:45 GMT Subject: Integrated: 8258953: AArch64: move NEON instructions to aarch64_neon.ad In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 01:37:33 GMT, Dong Bo wrote: > As discussed in [1], all NEON instructions should be moved from `aarch64.ad` to `aarch64_neon.ad`. > > In the first commit [2] of this PR, the NEON instructions are deleted from `aarch64.ad` and appended to `aarch64_neon.ad`. > I compared the generated code in `aarch64_neon.ad` with original code in `aarch64.ad`, no suspicious differences found. > The last two commits just simply move code around in `aarch64_neon.ad` to put related instructions together, i.e. `LoadStore` [3], `Reduction` [4]. > > This also supports vector length 4 for `vsraa8B_imm` and `vsrla8B_imm`, vector length 2 for `vsraa4S_imm` and `vsrla4S_imm`, fixes few typos, e.g. `vor8B`, `vsrla4S_imm`. > > [1] https://github.com/openjdk/jdk/pull/1215#issuecomment-728186803 > [2] https://github.com/dgbo/jdk/commit/40cbe99e647cdf93712edf8f77ab3b5b30ea0a95 > [3] https://github.com/dgbo/jdk/commit/695fb8f8ef009b733a8f804e791347f4bfe2572e > [4] https://github.com/dgbo/jdk/commit/e0c38aa9aaa6af9925a3821328384b1e2b2c2070 This pull request has now been integrated. Changeset: aa5bc6ed Author: Dong Bo Committer: Fei Yang URL: https://git.openjdk.java.net/jdk/commit/aa5bc6ed Stats: 5661 lines in 3 files changed: 3216 ins; 2435 del; 10 mod 8258953: AArch64: move NEON instructions to aarch64_neon.ad Reviewed-by: njian, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/2273 From thartmann at openjdk.java.net Mon Feb 8 07:05:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 8 Feb 2021 07:05:42 GMT Subject: RFR: 8261229: MethodData is not correctly initialized with TieredStopAtLevel=3 [v2] In-Reply-To: <-RNQ0M9cHiIX7na2DnUnhucXZ35JILV8sdMO0oAeaOI=.42b67e78-71c8-4bcf-aaa7-22889fbfbba5@github.com> References: <-ReDb8Jf-EIuYNyQltIaKSF1tUPmswp6kx_mciU5Fvk=.31d8565c-26fe-4f7f-8b88-55bcc996bfca@github.com> <-RNQ0M9cHiIX7na2DnUnhucXZ35JILV8sdMO0oAeaOI=.42b67e78-71c8-4bcf-aaa7-22889fbfbba5@github.com> Message-ID: On Sat, 6 Feb 2021 20:47:00 GMT, Igor Veresov wrote: >> Mostly a typo in compilation mode ergonomics that selected a quick-only mode essentially when the user specified TieredStopAtLevel={1,2,3}. The quick-only mode has an optimization that eliminates parts of the MDO since they are not needed. Meanwhile, the WB API considered it a fair game to request a level 3 compile, that requires a full MDO. >> >> The fix corrects the original issue and also tries to be extra defensive with WB API (since it's semantics is not clearly specified) by always allocating full MDO if WB API is on. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Remove WB defence Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2444 From thartmann at openjdk.java.net Mon Feb 8 07:08:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 8 Feb 2021 07:08:42 GMT Subject: RFR: 8261229: MethodData is not correctly initialized with TieredStopAtLevel=3 [v2] In-Reply-To: References: <-ReDb8Jf-EIuYNyQltIaKSF1tUPmswp6kx_mciU5Fvk=.31d8565c-26fe-4f7f-8b88-55bcc996bfca@github.com> <-RNQ0M9cHiIX7na2DnUnhucXZ35JILV8sdMO0oAeaOI=.42b67e78-71c8-4bcf-aaa7-22889fbfbba5@github.com> Message-ID: On Mon, 8 Feb 2021 07:02:46 GMT, Tobias Hartmann wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove WB defence > > Looks good to me. Just wondering if we should add a regression test (i.e. add a `TieredStopAtLeveL=3`) run to the test(s) you've mentioned in the bug comments? ------------- PR: https://git.openjdk.java.net/jdk/pull/2444 From iveresov at openjdk.java.net Mon Feb 8 07:13:41 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Mon, 8 Feb 2021 07:13:41 GMT Subject: RFR: 8261229: MethodData is not correctly initialized with TieredStopAtLevel=3 [v2] In-Reply-To: References: <-ReDb8Jf-EIuYNyQltIaKSF1tUPmswp6kx_mciU5Fvk=.31d8565c-26fe-4f7f-8b88-55bcc996bfca@github.com> <-RNQ0M9cHiIX7na2DnUnhucXZ35JILV8sdMO0oAeaOI=.42b67e78-71c8-4bcf-aaa7-22889fbfbba5@github.com> Message-ID: On Mon, 8 Feb 2021 07:06:09 GMT, Tobias Hartmann wrote: > Just wondering if we should add a regression test (i.e. add a `TieredStopAtLeveL=3`) run to the test(s) you've mentioned in the bug comments? I think we should rather start running the existing tests in compiler/whitebox and compiler/tiered with TieredStopAtLevel={1, 2,3} instead. Many of them failed when I did that, 3 of them failed because of this particular problem. Many of these tests are specifically crafted to be run with different TieredStopAtLevel values. ------------- PR: https://git.openjdk.java.net/jdk/pull/2444 From thartmann at openjdk.java.net Mon Feb 8 07:23:41 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 8 Feb 2021 07:23:41 GMT Subject: RFR: 8261229: MethodData is not correctly initialized with TieredStopAtLevel=3 [v2] In-Reply-To: References: <-ReDb8Jf-EIuYNyQltIaKSF1tUPmswp6kx_mciU5Fvk=.31d8565c-26fe-4f7f-8b88-55bcc996bfca@github.com> <-RNQ0M9cHiIX7na2DnUnhucXZ35JILV8sdMO0oAeaOI=.42b67e78-71c8-4bcf-aaa7-22889fbfbba5@github.com> Message-ID: On Mon, 8 Feb 2021 07:09:24 GMT, Igor Veresov wrote: >> Just wondering if we should add a regression test (i.e. add a `TieredStopAtLeveL=3`) run to the test(s) you've mentioned in the bug comments? > >> Just wondering if we should add a regression test (i.e. add a `TieredStopAtLeveL=3`) run to the test(s) you've mentioned in the bug comments? > > I think we should rather start running the existing tests in compiler/whitebox and compiler/tiered with TieredStopAtLevel={1, 2,3} instead. Many of them failed when I did that, 3 of them failed because of this particular problem. > > Many of these tests are specifically crafted to be run with different TieredStopAtLevel values. Okay, even better. ------------- PR: https://git.openjdk.java.net/jdk/pull/2444 From iveresov at openjdk.java.net Mon Feb 8 17:11:40 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Mon, 8 Feb 2021 17:11:40 GMT Subject: Integrated: 8261229: MethodData is not correctly initialized with TieredStopAtLevel=3 In-Reply-To: <-ReDb8Jf-EIuYNyQltIaKSF1tUPmswp6kx_mciU5Fvk=.31d8565c-26fe-4f7f-8b88-55bcc996bfca@github.com> References: <-ReDb8Jf-EIuYNyQltIaKSF1tUPmswp6kx_mciU5Fvk=.31d8565c-26fe-4f7f-8b88-55bcc996bfca@github.com> Message-ID: On Sat, 6 Feb 2021 06:16:45 GMT, Igor Veresov wrote: > Mostly a typo in compilation mode ergonomics that selected a quick-only mode essentially when the user specified TieredStopAtLevel={1,2,3}. The quick-only mode has an optimization that eliminates parts of the MDO since they are not needed. Meanwhile, the WB API considered it a fair game to request a level 3 compile, that requires a full MDO. > > The fix corrects the original issue and also tries to be extra defensive with WB API (since it's semantics is not clearly specified) by always allocating full MDO if WB API is on. This pull request has now been integrated. Changeset: 29a428f5 Author: Igor Veresov URL: https://git.openjdk.java.net/jdk/commit/29a428f5 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8261229: MethodData is not correctly initialized with TieredStopAtLevel=3 Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2444 From kvn at openjdk.java.net Mon Feb 8 18:48:46 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 8 Feb 2021 18:48:46 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v2] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Fri, 5 Feb 2021 07:29:00 GMT, Wang Huang wrote: >> JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: >> @Benchmark >> public void testMethod(Blackhole bh) { >> int sum = 0; >> for (int i = 0; i < data.length; i++) { >> Integer ii = Integer.valueOf(data[i]); >> if (i < data.length) { >> sum += ii.intValue(); >> } >> } >> bh.consume(sum); >> } >> Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. >> The uncommon_trap is generated by the optimized "if", because its condition is always true. >> >> We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, >> and deleting the use of box: >> >> There is no additional fail/error(s) of jtreg after this patch. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > refactor codes I am a little concern about stretching uses of input value outside scope where it is created (for example, loop's variable or a value depending on it). This optimization may work only because boxed values are immutable. I will run our tests with this changes. src/hotspot/share/opto/callGenerator.cpp line 582: > 580: Node* uncommon_trap_node = delay_boxes.pop(); > 581: int in_edge = uncommon_trap_node->find_edge(res); > 582: assert(in_edge > 0, "sanity"); If there are several references you need to replace all of them. src/hotspot/share/opto/callGenerator.cpp line 591: > 589: Node* sobj = new SafePointScalarObjectNode(gvn.type(res)->isa_oopptr(), > 590: #ifdef ASSERT > 591: NULL, I would suggest to record `call` node here treating it as allocation. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From enikitin at openjdk.java.net Mon Feb 8 20:01:42 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Mon, 8 Feb 2021 20:01:42 GMT Subject: RFR: 8058176: [mlvm] Tests should tolerate exceptions caused by code cache exhaustion. In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 22:16:14 GMT, Evgeny Nikitin wrote: > A repetition of the #1622. > > 1. Normalise meth/stress/compiler/i2c_c2i/Test.java to use MultiThreadedTest framework; > 2. Adjust MultiThreadedTest in order to accomodate the i2c_c2i test (add prepareThread method and logic); > 3. Add ThrowableTolerance and DefaultThrowableTolerance as ways to control what Throwables are accepted; > 4. Adjust MultiThreadedTest to catch Throwables and check if they are accepted; > 5. Adjust individual tests to catch possible Throwables and check if they are accepted; > 6. Un-problemlist the failing tests. > > Testing vmTestBase/vm/mlvm/meth/stress run on macos-linux-windows (30 runs each) in x64 configurations, rebased on top of latest code base. Code cache was limited `-XX:ReservedCodeCacheSize=8M` as suggested in the case. Needs reworking. ------------- PR: https://git.openjdk.java.net/jdk/pull/2440 From enikitin at openjdk.java.net Mon Feb 8 20:01:43 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Mon, 8 Feb 2021 20:01:43 GMT Subject: Withdrawn: 8058176: [mlvm] Tests should tolerate exceptions caused by code cache exhaustion. In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 22:16:14 GMT, Evgeny Nikitin wrote: > A repetition of the #1622. > > 1. Normalise meth/stress/compiler/i2c_c2i/Test.java to use MultiThreadedTest framework; > 2. Adjust MultiThreadedTest in order to accomodate the i2c_c2i test (add prepareThread method and logic); > 3. Add ThrowableTolerance and DefaultThrowableTolerance as ways to control what Throwables are accepted; > 4. Adjust MultiThreadedTest to catch Throwables and check if they are accepted; > 5. Adjust individual tests to catch possible Throwables and check if they are accepted; > 6. Un-problemlist the failing tests. > > Testing vmTestBase/vm/mlvm/meth/stress run on macos-linux-windows (30 runs each) in x64 configurations, rebased on top of latest code base. Code cache was limited `-XX:ReservedCodeCacheSize=8M` as suggested in the case. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/2440 From kvn at openjdk.java.net Mon Feb 8 22:50:42 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 8 Feb 2021 22:50:42 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v2] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Mon, 8 Feb 2021 18:45:37 GMT, Vladimir Kozlov wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor codes > > I am a little concern about stretching uses of input value outside scope where it is created (for example, loop's variable or a value depending on it). > This optimization may work only because boxed values are immutable. > I will run our tests with this changes. Our tier1-4 testing passed ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From xliu at openjdk.java.net Tue Feb 9 00:33:51 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 9 Feb 2021 00:33:51 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v2] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Mon, 8 Feb 2021 22:47:43 GMT, Vladimir Kozlov wrote: >> I am a little concern about stretching uses of input value outside scope where it is created (for example, loop's variable or a value depending on it). >> This optimization may work only because boxed values are immutable. >> I will run our tests with this changes. > > Our tier1-4 testing passed > I was wandering if we can remove the useless `if` as it's always true in this case. Do you know why this kind of `if` haven't been eliminated by GVN phase? Eventually, c2 should know that `i < data.length' is true. it should be done by GCP later. The example is a special case. you can remove this " if (i < data.length)" here. Generally, target code can look like this. c2 speculatively generates an uncommon_trap of "unstable_if". for (int i = 0; i < data.length; i++) { Integer ii = Integer.valueOf(data[i]); if (cond) { //likely sum += ii.intValue(); } } ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From xliu at openjdk.java.net Tue Feb 9 00:59:43 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 9 Feb 2021 00:59:43 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v2] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Fri, 5 Feb 2021 07:29:00 GMT, Wang Huang wrote: >> JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: >> @Benchmark >> public void testMethod(Blackhole bh) { >> int sum = 0; >> for (int i = 0; i < data.length; i++) { >> Integer ii = Integer.valueOf(data[i]); >> if (i < data.length) { >> sum += ii.intValue(); >> } >> } >> bh.consume(sum); >> } >> Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. >> The uncommon_trap is generated by the optimized "if", because its condition is always true. >> >> We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, >> and deleting the use of box: >> >> There is no additional fail/error(s) of jtreg after this patch. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > refactor codes src/hotspot/share/opto/callGenerator.cpp line 558: > 556: > 557: static void delay_box_in_uncommon_trap(CallNode* call, Node* resproj) { > 558: if (resproj != NULL && call->is_CallStaticJava() && IMHO, we should use nullptr here because hotspot now is using c++14. src/hotspot/share/opto/callGenerator.cpp line 560: > 558: if (resproj != NULL && call->is_CallStaticJava() && > 559: call->as_CallStaticJava()->is_boxing_method()) { > 560: GraphKit kit(call->jvms()); you can postpone to construct this object in if(no_use). src/hotspot/share/opto/callGenerator.cpp line 586: > 584: ciInstanceKlass* klass = call->as_CallStaticJava()->method()->holder(); > 585: int n_fields = klass->nof_nonstatic_fields(); > 586: assert(n_fields == 1, "sanity"); I think you also need to check the only non-static field of klass must be a scalar. "sanity" is too concise. I think we should leave a message to say it's an auto-boxing class. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From xliu at openjdk.java.net Tue Feb 9 00:59:44 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 9 Feb 2021 00:59:44 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v2] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: <3eCQyq3X_TP8Si_Dok_DEsAWGLpy94RM00iOmWo03ao=.7b2b4d17-7656-4c24-8fea-eb0976d98359@github.com> On Mon, 8 Feb 2021 18:15:54 GMT, Vladimir Kozlov wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor codes > > src/hotspot/share/opto/callGenerator.cpp line 582: > >> 580: Node* uncommon_trap_node = delay_boxes.pop(); >> 581: int in_edge = uncommon_trap_node->find_edge(res); >> 582: assert(in_edge > 0, "sanity"); > > If there are several references you need to replace all of them. +1 scalar replacement uses range substitution https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macro.cpp#L892 ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From dlong at openjdk.java.net Tue Feb 9 01:38:52 2021 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 9 Feb 2021 01:38:52 GMT Subject: RFR: 8241502: Migrate x86_64.ad to MacroAssembler In-Reply-To: References: <0azhJ4pD5Tq_lkpPtYMpQBjokflcSQEdWP2Rz9HBm6k=.c3ece6fd-1ae7-49ea-a6eb-ec88a9fbd54d@github.com> Message-ID: On Fri, 5 Feb 2021 18:43:27 GMT, Ioi Lam wrote: >> I agree. We wanted to do that for long time. > > I am curious if the x86_64.o file changes in any significant way (speed of size). I wish there was a way for the old and new versions to co-exist at the same time, so we could generate the code the old way and and the new way, then compare, for automatic verification of the MacroAssember version. ------------- PR: https://git.openjdk.java.net/jdk/pull/2420 From whuang at openjdk.java.net Tue Feb 9 01:43:43 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Tue, 9 Feb 2021 01:43:43 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v2] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Tue, 9 Feb 2021 00:35:08 GMT, Xin Liu wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor codes > > src/hotspot/share/opto/callGenerator.cpp line 558: > >> 556: >> 557: static void delay_box_in_uncommon_trap(CallNode* call, Node* resproj) { >> 558: if (resproj != NULL && call->is_CallStaticJava() && > > IMHO, we should use nullptr here because hotspot now is using c++14. Thank you for your review. I will change that. > src/hotspot/share/opto/callGenerator.cpp line 560: > >> 558: if (resproj != NULL && call->is_CallStaticJava() && >> 559: call->as_CallStaticJava()->is_boxing_method()) { >> 560: GraphKit kit(call->jvms()); > > you can postpone to construct this object in if(no_use). Sure. I will change that. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From whuang at openjdk.java.net Tue Feb 9 02:36:42 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Tue, 9 Feb 2021 02:36:42 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v2] In-Reply-To: <3eCQyq3X_TP8Si_Dok_DEsAWGLpy94RM00iOmWo03ao=.7b2b4d17-7656-4c24-8fea-eb0976d98359@github.com> References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> <3eCQyq3X_TP8Si_Dok_DEsAWGLpy94RM00iOmWo03ao=.7b2b4d17-7656-4c24-8fea-eb0976d98359@github.com> Message-ID: On Tue, 9 Feb 2021 00:52:31 GMT, Xin Liu wrote: >> src/hotspot/share/opto/callGenerator.cpp line 582: >> >>> 580: Node* uncommon_trap_node = delay_boxes.pop(); >>> 581: int in_edge = uncommon_trap_node->find_edge(res); >>> 582: assert(in_edge > 0, "sanity"); >> >> If there are several references you need to replace all of them. > > +1 > scalar replacement uses range substitution > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macro.cpp#L892 Thank you for your review. It's my fault. I will revise this in next commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From github.com+10482586+therealeliu at openjdk.java.net Tue Feb 9 03:10:40 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Tue, 9 Feb 2021 03:10:40 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v3] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 14:26:32 GMT, Andrew Haley wrote: >> Eric Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Add benchmark test >> >> Change-Id: I63ca51d06070a07e5c20daf4b42d2c8d7237a1da > > All that remains to do is the benchmarks. @theRealAph Could you help to take a look ? ------------- PR: https://git.openjdk.java.net/jdk/pull/1858 From Pengfei.Li at arm.com Tue Feb 9 03:21:32 2021 From: Pengfei.Li at arm.com (Pengfei Li) Date: Tue, 9 Feb 2021 03:21:32 +0000 Subject: [11u] RFR(S): 8261022: Fix incorrect result of Math.abs() with char type Message-ID: Hi, I'd like to backport JDK-8261022 to jdk11u. Original JBS: https://bugs.openjdk.java.net/browse/JDK-8261022 Modified webrev: http://cr.openjdk.java.net/~pli/rfr/8261022/backport11u/ This issue causes vectorized abs generate incorrect result when the argument has char type. Root cause is that the vector abs operation is not specially handled in computing vector element types after we enabled that in JDK-8222074 in jdk13. As JDK-8222074 was backported to jdk11u, jdk11u is also affected. The patch to fix this is in jdk17 now. The fix does not apply to jdk11u cleanly, as VectorNode::is_shift_opcode() is not defined in jdk11u. I have modified the patch a little bit to fit this difference. Tested jtreg hotspot::tier1 and the newly added jtreg case. No failure after the modified patch. -- Thanks, Pengfei From dongbo at openjdk.java.net Tue Feb 9 07:03:58 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 9 Feb 2021 07:03:58 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width Message-ID: In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 8`, but `ushr dst.4H, src.4H, 16` instead. According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); The legal right shift amount should be in the range 1 to the element width in bits on aarch64: https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. ------------- Commit messages: - fix trailing whitespaces - 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width Changes: https://git.openjdk.java.net/jdk/pull/2472/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261142 Stats: 771 lines in 3 files changed: 641 ins; 17 del; 113 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From njian at openjdk.java.net Tue Feb 9 07:55:43 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Tue, 9 Feb 2021 07:55:43 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width In-Reply-To: References: Message-ID: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> On Tue, 9 Feb 2021 06:55:50 GMT, Dong Bo wrote: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Thanks for the fix. src/hotspot/cpu/aarch64/aarch64_neon.ad line 5285: > 5283: ins_encode %{ > 5284: int sh = (int)$shift$$constant; > 5285: if (sh == 0) { If src and dst are the same reg, no need to emit code. Or maybe c2 can even be improved to optimize this (sh=0 case) out? src/hotspot/cpu/aarch64/aarch64_neon.ad line 5271: > 5269: } else { > 5270: if (sh >= 8) sh = 7; > 5271: __ sshr(as_FloatRegister($dst$$reg), __ T8B, I think we should add an assert to make sure 0 is not passed to the assembler. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Tue Feb 9 08:57:12 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 9 Feb 2021 08:57:12 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width In-Reply-To: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> Message-ID: On Tue, 9 Feb 2021 07:47:57 GMT, Ningsheng Jian wrote: > If src and dst are the same reg, no need to emit code. If we want to do this enhancement, I think we need do it for left shifting and all SVE left/right shifting as well for completeness. > Or maybe c2 can even be improved to optimize this (sh=0 case) out? We can add code in `Ideal` to optimize it to ORR, but I'm not sure `orr` performs better than `shift` on other platforms. Seems we have to created a generic new node to do `vector move` here. > src/hotspot/cpu/aarch64/aarch64_neon.ad line 5271: > >> 5269: } else { >> 5270: if (sh >= 8) sh = 7; >> 5271: __ sshr(as_FloatRegister($dst$$reg), __ T8B, > > I think we should add an assert to make sure 0 is not passed to the assembler. Agree, I'll do this. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Tue Feb 9 09:13:47 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 9 Feb 2021 09:13:47 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: add assertion in the assembler ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/c44bebb0..8439f167 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From aph at openjdk.java.net Tue Feb 9 09:32:37 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 9 Feb 2021 09:32:37 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2] In-Reply-To: References: Message-ID: <_ach7OekIqkqmFRW3JqA5h4Q_HQUbRni0vkFzx5q3MA=.536a9faa-98c9-4dd9-9798-dcc794e23cd0@github.com> On Tue, 9 Feb 2021 09:13:47 GMT, Dong Bo wrote: >> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, >> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: >> /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ >> public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); >> >> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, >> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. >> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. >> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); >> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); >> >> The legal right shift amount should be in the range 1 to the element width in bits on aarch64: >> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en >> >> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. >> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > add assertion in the assembler src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 2057: > 2055: as_FloatRegister($src$$reg), as_FloatRegister($src$$reg)); > 2056: } else {ifelse($4, B,` > 2057: if (sh >= 8) sh = 7; I think it would be possible to move some of this logic from the AD file into MacroAssembler, with macros to generate the appropriate instruction based on their arguments. This might be cleaner: the logic here is very hard to follow. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From sjohanss at openjdk.java.net Tue Feb 9 15:06:05 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 9 Feb 2021 15:06:05 GMT Subject: RFR: 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes Message-ID: When adding a code heap the page size used with the underlying mapping is traced using `os::trace_page_sizes`. The old code tried to estimate the page-size based on the size, but the mapping has already been done so it is better to check the passed in `ReservedSpace`. Today we don't record the page size in the ReservedSpace, but we have a helper to do a good estimate: `ReservedSpace::actual_reserved_page_size()`. The proposal is to use this function. When changing this I also realized that the traced min-size used un-aligned value while the actual `initialize`-call correctly uses the aligned size. Changed so that we also use the aligned size for tracing. I'm currently doing some more work in this area and while I haven't added a specific test for this issue I have created a test I plan to integrate separately when a few more needed changes have gone in. The test is Linux-only and validates the output from `os::trace_page_sizes` against the information in `/proc/self/smaps`. ------------- Commit messages: - 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes Changes: https://git.openjdk.java.net/jdk/pull/2481/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2481&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261029 Stats: 9 lines in 1 file changed: 1 ins; 6 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2481.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2481/head:pull/2481 PR: https://git.openjdk.java.net/jdk/pull/2481 From kvn at openjdk.java.net Tue Feb 9 18:47:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 9 Feb 2021 18:47:39 GMT Subject: RFR: 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes In-Reply-To: References: Message-ID: <2Z_5m14VBbD6IWQz17vzBfXu0UQm1m0SByRvxv2K91I=.ee206e21-0fb1-4170-9ab8-4df2d478f5bc@github.com> On Tue, 9 Feb 2021 13:45:38 GMT, Stefan Johansson wrote: > When adding a code heap the page size used with the underlying mapping is traced using `os::trace_page_sizes`. The old code tried to estimate the page-size based on the size, but the mapping has already been done so it is better to check the passed in `ReservedSpace`. Today we don't record the page size in the ReservedSpace, but we have a helper to do a good estimate: `ReservedSpace::actual_reserved_page_size()`. The proposal is to use this function. > > When changing this I also realized that the traced min-size used un-aligned value while the actual `initialize`-call correctly uses the aligned size. Changed so that we also use the aligned size for tracing. > > I'm currently doing some more work in this area and while I haven't added a specific test for this issue I have created a test I plan to integrate separately when a few more needed changes have gone in. The test is Linux-only and validates the output from `os::trace_page_sizes` against the information in `/proc/self/smaps`. I think the difference comes from JDK-8087339 changes which did not update this code: https://github.com/openjdk/jdk/commit/925a508b2bb1600909418405e9e5ea1a93a94580 I agree that alignment here should match one used in CodeCache::reserve_heap_memory() and not recalculated. But I am not sure actual_reserved_page_size() returns correct value. May be we should record value in CodeHeap object when it is created. ------------- PR: https://git.openjdk.java.net/jdk/pull/2481 From sjohanss at openjdk.java.net Tue Feb 9 19:25:37 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 9 Feb 2021 19:25:37 GMT Subject: RFR: 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes In-Reply-To: <2Z_5m14VBbD6IWQz17vzBfXu0UQm1m0SByRvxv2K91I=.ee206e21-0fb1-4170-9ab8-4df2d478f5bc@github.com> References: <2Z_5m14VBbD6IWQz17vzBfXu0UQm1m0SByRvxv2K91I=.ee206e21-0fb1-4170-9ab8-4df2d478f5bc@github.com> Message-ID: On Tue, 9 Feb 2021 18:44:27 GMT, Vladimir Kozlov wrote: > I agree that alignment here should match one used in CodeCache::reserve_heap_memory() and not recalculated. But I am not sure actual_reserved_page_size() returns correct value. > May be we should record value in CodeHeap object when it is created. Yes, the `actual_reserved_page_size()` is far from perfect and I plan to update `ReservedSpace` to have a page size member that can be queried in places like this and then we can remove this helper. This will be required once we allow multiple large page sizes ([PR#1153](https://github.com/openjdk/jdk/pull/1153)). That said, `actual_reserved_page_size()` is currently doing a good job returning the correct page size since it is considering both if the space is "special", what alignment it used and if transparent huge pages are enabled. I would prefer doing the change that records the page size i `ReservedSpace` as a separate patch and in that patch also remove all uses of `actual_reserved_page_size()`. Doing this change now is required to be able to integrate the [new test](https://github.com/openjdk/jdk/compare/master...kstefanj:test-for-trace-page-sizes) I mentioned, and I think it will be helpful for work in this area going forward. ------------- PR: https://git.openjdk.java.net/jdk/pull/2481 From njian at openjdk.java.net Wed Feb 10 01:38:39 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Wed, 10 Feb 2021 01:38:39 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2] In-Reply-To: References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> Message-ID: On Tue, 9 Feb 2021 08:53:14 GMT, Dong Bo wrote: >> src/hotspot/cpu/aarch64/aarch64_neon.ad line 5285: >> >>> 5283: ins_encode %{ >>> 5284: int sh = (int)$shift$$constant; >>> 5285: if (sh == 0) { >> >> If src and dst are the same reg, no need to emit code. Or maybe c2 can even be improved to optimize this (sh=0 case) out? > >> If src and dst are the same reg, no need to emit code. > > If we want to do this enhancement, I think we need do it for left shifting and all SVE left/right shifting as well for completeness. > >> Or maybe c2 can even be improved to optimize this (sh=0 case) out? > > We can add code in `Ideal` to optimize it to ORR, but I'm not sure `orr` performs better than `shift` on other platforms. > Seems we have to created a generic new node to do `vector move` here. I think with proper optimization, no move is required. But I agree it's beyond the scope of this patch. I will have a look. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Wed Feb 10 02:56:57 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 10 Feb 2021 02:56:57 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v3] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: back out AD modifications and handle zero shift in assembler ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/8439f167..af3f2a15 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=01-02 Stats: 284 lines in 3 files changed: 19 ins; 143 del; 122 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Wed Feb 10 03:02:41 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 10 Feb 2021 03:02:41 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2] In-Reply-To: <_ach7OekIqkqmFRW3JqA5h4Q_HQUbRni0vkFzx5q3MA=.536a9faa-98c9-4dd9-9798-dcc794e23cd0@github.com> References: <_ach7OekIqkqmFRW3JqA5h4Q_HQUbRni0vkFzx5q3MA=.536a9faa-98c9-4dd9-9798-dcc794e23cd0@github.com> Message-ID: <0wUxJ4QUIzC-Hg4qSPtf8nFP0ov9J69nA3gjaoEJcWY=.ede23adb-4010-460d-8ac4-d560ace8ffc0@github.com> On Tue, 9 Feb 2021 09:29:50 GMT, Andrew Haley wrote: >> Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 2057: > >> 2055: as_FloatRegister($src$$reg), as_FloatRegister($src$$reg)); >> 2056: } else {ifelse($4, B,` >> 2057: if (sh >= 8) sh = 7; > > I think it would be possible to move some of this logic from the AD file into MacroAssembler, with macros to generate the appropriate instruction based on their arguments. This might be cleaner: the logic here is very hard to follow. I backed out the modifications of `aarch64_neon.ad` and `aarch64_neon_ad.m4`. The `shift == 0` case is handled by the assembler now. Verified with the regression tests. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From whuang at openjdk.java.net Wed Feb 10 06:45:53 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Wed, 10 Feb 2021 06:45:53 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v3] In-Reply-To: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: > JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: > @Benchmark > public void testMethod(Blackhole bh) { > int sum = 0; > for (int i = 0; i < data.length; i++) { > Integer ii = Integer.valueOf(data[i]); > if (i < data.length) { > sum += ii.intValue(); > } > } > bh.consume(sum); > } > Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. > The uncommon_trap is generated by the optimized "if", because its condition is always true. > > We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, > and deleting the use of box: > > There is no additional fail/error(s) of jtreg after this patch. Wang Huang has updated the pull request incrementally with one additional commit since the last revision: fix some bugs ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2401/files - new: https://git.openjdk.java.net/jdk/pull/2401/files/4dfee52a..4c62ec8d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2401&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2401&range=01-02 Stats: 20 lines in 1 file changed: 6 ins; 5 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/2401.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2401/head:pull/2401 PR: https://git.openjdk.java.net/jdk/pull/2401 From stuefe at openjdk.java.net Wed Feb 10 06:48:37 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 10 Feb 2021 06:48:37 GMT Subject: RFR: 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 13:45:38 GMT, Stefan Johansson wrote: > When adding a code heap the page size used with the underlying mapping is traced using `os::trace_page_sizes`. The old code tried to estimate the page-size based on the size, but the mapping has already been done so it is better to check the passed in `ReservedSpace`. Today we don't record the page size in the ReservedSpace, but we have a helper to do a good estimate: `ReservedSpace::actual_reserved_page_size()`. The proposal is to use this function. > > When changing this I also realized that the traced min-size used un-aligned value while the actual `initialize`-call correctly uses the aligned size. Changed so that we also use the aligned size for tracing. > > I'm currently doing some more work in this area and while I haven't added a specific test for this issue I have created a test I plan to integrate separately when a few more needed changes have gone in. The test is Linux-only and validates the output from `os::trace_page_sizes` against the information in `/proc/self/smaps`. Hi Stefan, I think this is okay if we keep in mind that `ReservedSpace::actual_reserved_page_size` needs fixing up. At the latest after JDK-8256155 hits. Rather use one function than to have to hunt down all the places where caller code tries to guess the page size. Are you currently working on this? If yes, what are your plans? I have a half finished prototype where I changed os::reserve_xxx() to return meta information about the reservation alongside the pointer, one of them the reserved page size. Have you guys decided whether its okay to remove the "multiple page sizes per reservation" feature? There is also another possible simplification I was thinking about, which is to remove the "UseSHM" feature from Linux. I honestly do not know why we still need it. That would simplify rework of large page handling on Linux a lot. I did ask around in December: https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-December/046885.html , but did not get many answers. Cheers, Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2481 From dongbo at openjdk.java.net Wed Feb 10 06:56:53 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 10 Feb 2021 06:56:53 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v4] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: generate add if shift == 0 for accumulation and fix some test code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/af3f2a15..a7b72b0a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=02-03 Stats: 127 lines in 2 files changed: 27 ins; 0 del; 100 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From whuang at openjdk.java.net Wed Feb 10 06:58:57 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Wed, 10 Feb 2021 06:58:57 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v4] In-Reply-To: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: > JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: > @Benchmark > public void testMethod(Blackhole bh) { > int sum = 0; > for (int i = 0; i < data.length; i++) { > Integer ii = Integer.valueOf(data[i]); > if (i < data.length) { > sum += ii.intValue(); > } > } > bh.consume(sum); > } > Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. > The uncommon_trap is generated by the optimized "if", because its condition is always true. > > We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, > and deleting the use of box: > > There is no additional fail/error(s) of jtreg after this patch. Wang Huang has updated the pull request incrementally with one additional commit since the last revision: delete useless line ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2401/files - new: https://git.openjdk.java.net/jdk/pull/2401/files/4c62ec8d..e80e4959 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2401&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2401&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2401.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2401/head:pull/2401 PR: https://git.openjdk.java.net/jdk/pull/2401 From chagedorn at openjdk.java.net Wed Feb 10 09:02:55 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 10 Feb 2021 09:02:55 GMT Subject: RFR: 8240281: Remove failing assertion code when selecting first memory state in SuperWord::co_locate_pack Message-ID: While working on [JDK-8238438](https://bugs.openjdk.java.net/browse/JDK-8238438), it was not clear if there is a case where `SuperWord::co_locate_pack()` should pick the memory state of the first load of a pack. We could not find an example and therefore added an `assert(false)` and left the code there to clean it up at some point if the assert was never hit. Now, a newly found fuzzer test showed that are cases where we need to pick the first memory state, triggering the `assert(false)`. In the test case `test()`, a store and a load pack are created but the store pack is filtered in `SuperWord::filter_packs()`. The load `x += iArrFld[j]` must read the old value before the store `iArrFld[j] = j` overrides it. Therefore, the load vector must be executed before any of the stores to `iArrFld[j]`. This, however, is not the case if we pick the memory state of the last load which results in wrong values for `iArrFld`: The stores and the loads are dependent and some of the stores are already executed before the load vector. The fix is to remove the assertion code added by JDK-8238438 and keep the code for selecting the first memory state of a load pack. Thanks, Christian ------------- Commit messages: - Remove trailing whitespaces - 8240281: Remove failing assertion code when selecting first memory state in SuperWord::co_locate_pack Changes: https://git.openjdk.java.net/jdk/pull/2495/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2495&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8240281 Stats: 146 lines in 2 files changed: 134 ins; 5 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/2495.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2495/head:pull/2495 PR: https://git.openjdk.java.net/jdk/pull/2495 From roland at openjdk.java.net Wed Feb 10 09:40:38 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 10 Feb 2021 09:40:38 GMT Subject: RFR: 8240281: Remove failing assertion code when selecting first memory state in SuperWord::co_locate_pack In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 08:49:10 GMT, Christian Hagedorn wrote: > While working on [JDK-8238438](https://bugs.openjdk.java.net/browse/JDK-8238438), it was not clear if there is a case where `SuperWord::co_locate_pack()` should pick the memory state of the first load of a pack. We could not find an example and therefore added an `assert(false)` and left the code there to clean it up at some point if the assert was never hit. > > Now, a newly found fuzzer test showed that are cases where we need to pick the first memory state, triggering the `assert(false)`. In the test case `test()`, a store and a load pack are created but the store pack is filtered in `SuperWord::filter_packs()`. The load `x += iArrFld[j]` must read the old value before the store `iArrFld[j] = j` overrides it. Therefore, the load vector must be executed before any of the stores to `iArrFld[j]`. This, however, is not the case if we pick the memory state of the last load which results in wrong values for `iArrFld`: The stores and the loads are dependent and some of the stores are already executed before the load vector. > > The fix is to remove the assertion code added by JDK-8238438 and keep the code for selecting the first memory state of a load pack. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2495 From chagedorn at openjdk.java.net Wed Feb 10 09:58:38 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 10 Feb 2021 09:58:38 GMT Subject: RFR: 8240281: Remove failing assertion code when selecting first memory state in SuperWord::co_locate_pack In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 09:37:41 GMT, Roland Westrelin wrote: >> While working on [JDK-8238438](https://bugs.openjdk.java.net/browse/JDK-8238438), it was not clear if there is a case where `SuperWord::co_locate_pack()` should pick the memory state of the first load of a pack. We could not find an example and therefore added an `assert(false)` and left the code there to clean it up at some point if the assert was never hit. >> >> Now, a newly found fuzzer test showed that are cases where we need to pick the first memory state, triggering the `assert(false)`. In the test case `test()`, a store and a load pack are created but the store pack is filtered in `SuperWord::filter_packs()`. The load `x += iArrFld[j]` must read the old value before the store `iArrFld[j] = j` overrides it. Therefore, the load vector must be executed before any of the stores to `iArrFld[j]`. This, however, is not the case if we pick the memory state of the last load which results in wrong values for `iArrFld`: The stores and the loads are dependent and some of the stores are already executed before the load vector. >> >> The fix is to remove the assertion code added by JDK-8238438 and keep the code for selecting the first memory state of a load pack. >> >> Thanks, >> Christian > > Looks good to me. Thank you Roland for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/2495 From dongbo at openjdk.java.net Wed Feb 10 09:59:55 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 10 Feb 2021 09:59:55 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v5] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: fix windows build failure ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/a7b72b0a..d75ee99e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From sjohanss at openjdk.java.net Wed Feb 10 10:28:38 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 10 Feb 2021 10:28:38 GMT Subject: RFR: 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 06:45:51 GMT, Thomas Stuefe wrote: > Are you currently working on this? If yes, what are your plans? I have a half finished prototype where I changed os::reserve_xxx() to return meta information about the reservation alongside the pointer, one of them the reserved page size. > Yes, I also have a prototype that I've been playing around with. It currently uses an out-parameter to return the page size from `os::reserve_memory_special*` calls. The page sizes is then saved in the ReservedSpace for later use. > Have you guys decided whether its okay to remove the "multiple page sizes per reservation" feature? > I've done some investigations but nothing have been decided. For now my prototype will return the smallest page size used by a mapping. Going forward I would like to do this, but it feels more urgent to get the other things in place first. > There is also another possible simplification I was thinking about, which is to remove the "UseSHM" feature from Linux. I honestly do not know why we still need it. That would simplify rework of large page handling on Linux a lot. I did ask around in December: https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-December/046885.html , but did not get many answers. > I know and I've been digging around in this area and agree even more now that it would be great to get rid of `UseSHM`. Not sure how if everybody agrees though and I'm currently working on small fix for `UseSHM` so that at least we don't leave it enabled everytime someone sets `+UseLargePages` without having any explicit large pages enabled ([PR#2488](https://github.com/openjdk/jdk/pull/2488)). ------------- PR: https://git.openjdk.java.net/jdk/pull/2481 From eirbjo at gmail.com Wed Feb 10 12:50:29 2021 From: eirbjo at gmail.com (=?UTF-8?B?RWlyaWsgQmrDuHJzbsO4cw==?=) Date: Wed, 10 Feb 2021 13:50:29 +0100 Subject: C1 crash in LinearScan::eliminate_spill_moves In-Reply-To: References: Message-ID: > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error ([..]/jdk/src/hotspot/share/c1/c1_LIR.hpp:732), > pid=5366, tid=23811 > # assert(res->vreg_number() == index) failed: conversion check > FWIW, I added some print statements to see what res->vreg_number() and index are when this assertion fails: CONVERSION CHECK FAILED vreg_number: -131071 index: 131073 Does a negative vreg_number make any sense here? Eirik. From eirbjo at gmail.com Wed Feb 10 14:24:39 2021 From: eirbjo at gmail.com (=?UTF-8?B?RWlyaWsgQmrDuHJzbsO4cw==?=) Date: Wed, 10 Feb 2021 15:24:39 +0100 Subject: C1 crash in LinearScan::eliminate_spill_moves In-Reply-To: References: Message-ID: > Does a negative vreg_number make any sense here? > c1_LIR.hpp:673 does the following: (index << LIR_OprDesc::data_shift) LIR_OprDesc::data_shift is 14 in this case and index is 131073. The left shift results in an integer overflow where the result is -2147467264. Having no idea what this function actually does, I'm not sure where to proceed debugging from here. (My C++ skills are that of a four-year) What does the index represent anyway? Some kind of virtual register? Why would there be 131073 indexes? Seems a tad excessive? Eirik. From stuefe at openjdk.java.net Wed Feb 10 14:48:37 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 10 Feb 2021 14:48:37 GMT Subject: RFR: 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes In-Reply-To: References: Message-ID: <6bnV1cycpmBBCDixEal0b-BcuNRtKgYjr0oogXAzDNE=.3daa8c6e-9155-4fd8-a915-bd4fc157f94f@github.com> On Wed, 10 Feb 2021 10:25:43 GMT, Stefan Johansson wrote: > > Are you currently working on this? If yes, what are your plans? I have a half finished prototype where I changed os::reserve_xxx() to return meta information about the reservation alongside the pointer, one of them the reserved page size. > > Yes, I also have a prototype that I've been playing around with. It currently uses an out-parameter to return the page size from `os::reserve_memory_special*` calls. The page sizes is then saved in the ReservedSpace for later use. I think that makes sense as a solution for this. My attempt was along this: void* os::reserve_xxx(size, ... blabla..., reservation_info_t* info = NULL); with reservation_info_t being a holder for information both "public" and opaque: e.g. whether this is executable memory (e.g. for MacOS MAP_JIT issue on committiong), the page size of course, as well as a way for platforms to piggyback internal information (eg. memory type used on AIX). But your solution sounds simpler, and its sufficient at least for now. So I don't think we work at cross purposes. > > > Have you guys decided whether its okay to remove the "multiple page sizes per reservation" feature? > > I've done some investigations but nothing have been decided. For now my prototype will return the smallest page size used by a mapping. Going forward I would like to do this, but it feels more urgent to get the other things in place first. No problem, but good to know its not forgotten. > > > There is also another possible simplification I was thinking about, which is to remove the "UseSHM" feature from Linux. I honestly do not know why we still need it. That would simplify rework of large page handling on Linux a lot. I did ask around in December: https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-December/046885.html , but did not get many answers. > > I know and I've been digging around in this area and agree even more now that it would be great to get rid of `UseSHM`. Not sure how if everybody agrees though and I'm currently working on small fix for `UseSHM` so that at least we don't leave it enabled everytime someone sets `+UseLargePages` without having any explicit large pages enabled ([PR#2488](https://github.com/openjdk/jdk/pull/2488)). Nice that you think the same. I am not sure many people are around which know the history. Maybe we should ask Andrew Haley, I believe he wrote some of that coding. I commented on your PR in your PR. Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/2481 From kvn at openjdk.java.net Wed Feb 10 17:36:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 10 Feb 2021 17:36:39 GMT Subject: RFR: 8240281: Remove failing assertion code when selecting first memory state in SuperWord::co_locate_pack In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 08:49:10 GMT, Christian Hagedorn wrote: > While working on [JDK-8238438](https://bugs.openjdk.java.net/browse/JDK-8238438), it was not clear if there is a case where `SuperWord::co_locate_pack()` should pick the memory state of the first load of a pack. We could not find an example and therefore added an `assert(false)` and left the code there to clean it up at some point if the assert was never hit. > > Now, a newly found fuzzer test showed that are cases where we need to pick the first memory state, triggering the `assert(false)`. In the test case `test()`, a store and a load pack are created but the store pack is filtered in `SuperWord::filter_packs()`. The load `x += iArrFld[j]` must read the old value before the store `iArrFld[j] = j` overrides it. Therefore, the load vector must be executed before any of the stores to `iArrFld[j]`. This, however, is not the case if we pick the memory state of the last load which results in wrong values for `iArrFld`: The stores and the loads are dependent and some of the stores are already executed before the load vector. > > The fix is to remove the assertion code added by JDK-8238438 and keep the code for selecting the first memory state of a load pack. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2495 From kvn at openjdk.java.net Wed Feb 10 17:57:40 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 10 Feb 2021 17:57:40 GMT Subject: RFR: 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes In-Reply-To: <6bnV1cycpmBBCDixEal0b-BcuNRtKgYjr0oogXAzDNE=.3daa8c6e-9155-4fd8-a915-bd4fc157f94f@github.com> References: <6bnV1cycpmBBCDixEal0b-BcuNRtKgYjr0oogXAzDNE=.3daa8c6e-9155-4fd8-a915-bd4fc157f94f@github.com> Message-ID: On Wed, 10 Feb 2021 14:45:55 GMT, Thomas Stuefe wrote: >>> Are you currently working on this? If yes, what are your plans? I have a half finished prototype where I changed os::reserve_xxx() to return meta information about the reservation alongside the pointer, one of them the reserved page size. >>> >> >> Yes, I also have a prototype that I've been playing around with. It currently uses an out-parameter to return the page size from `os::reserve_memory_special*` calls. The page sizes is then saved in the ReservedSpace for later use. >> >>> Have you guys decided whether its okay to remove the "multiple page sizes per reservation" feature? >>> >> >> I've done some investigations but nothing have been decided. For now my prototype will return the smallest page size used by a mapping. Going forward I would like to do this, but it feels more urgent to get the other things in place first. >> >>> There is also another possible simplification I was thinking about, which is to remove the "UseSHM" feature from Linux. I honestly do not know why we still need it. That would simplify rework of large page handling on Linux a lot. I did ask around in December: https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-December/046885.html , but did not get many answers. >>> >> >> I know and I've been digging around in this area and agree even more now that it would be great to get rid of `UseSHM`. Not sure how if everybody agrees though and I'm currently working on small fix for `UseSHM` so that at least we don't leave it enabled everytime someone sets `+UseLargePages` without having any explicit large pages enabled ([PR#2488](https://github.com/openjdk/jdk/pull/2488)). > >> > Are you currently working on this? If yes, what are your plans? I have a half finished prototype where I changed os::reserve_xxx() to return meta information about the reservation alongside the pointer, one of them the reserved page size. >> >> Yes, I also have a prototype that I've been playing around with. It currently uses an out-parameter to return the page size from `os::reserve_memory_special*` calls. The page sizes is then saved in the ReservedSpace for later use. > > I think that makes sense as a solution for this. My attempt was along this: > void* os::reserve_xxx(size, ... blabla..., reservation_info_t* info = NULL); > with reservation_info_t being a holder for information both "public" and opaque: e.g. whether this is executable memory (e.g. for MacOS MAP_JIT issue on committiong), the page size of course, as well as a way for platforms to piggyback internal information (eg. memory type used on AIX). > > But your solution sounds simpler, and its sufficient at least for now. So I don't think we work at cross purposes. > >> >> > Have you guys decided whether its okay to remove the "multiple page sizes per reservation" feature? >> >> I've done some investigations but nothing have been decided. For now my prototype will return the smallest page size used by a mapping. Going forward I would like to do this, but it feels more urgent to get the other things in place first. > > No problem, but good to know its not forgotten. > >> >> > There is also another possible simplification I was thinking about, which is to remove the "UseSHM" feature from Linux. I honestly do not know why we still need it. That would simplify rework of large page handling on Linux a lot. I did ask around in December: https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-December/046885.html , but did not get many answers. >> >> I know and I've been digging around in this area and agree even more now that it would be great to get rid of `UseSHM`. Not sure how if everybody agrees though and I'm currently working on small fix for `UseSHM` so that at least we don't leave it enabled everytime someone sets `+UseLargePages` without having any explicit large pages enabled ([PR#2488](https://github.com/openjdk/jdk/pull/2488)). > > Nice that you think the same. I am not sure many people are around which know the history. Maybe we should ask Andrew Haley, I believe he wrote some of that coding. > > I commented on your PR in your PR. > > Cheers, Thomas > > I agree that alignment here should match one used in CodeCache::reserve_heap_memory() and not recalculated. But I am not sure actual_reserved_page_size() returns correct value. > > May be we should record value in CodeHeap object when it is created. > > Yes, the `actual_reserved_page_size()` is far from perfect and I plan to update `ReservedSpace` to have a page size member that can be queried in places like this and then we can remove this helper. This will be required once we allow multiple large page sizes ([PR#1153](https://github.com/openjdk/jdk/pull/1153)). That said, `actual_reserved_page_size()` is currently doing a good job returning the correct page size since it is considering both if the space is "special", what alignment it used and if transparent huge pages are enabled. > > I would prefer doing the change that records the page size i `ReservedSpace` as a separate patch and in that patch also remove all uses of `actual_reserved_page_size()`. Doing this change now is required to be able to integrate the [new test](https://github.com/openjdk/jdk/compare/master...kstefanj:test-for-trace-page-sizes) I mentioned, and I think it will be helpful for work in this area going forward. Are you talking about [8261230](https://bugs.openjdk.java.net/browse/JDK-8261230) to do recording? Okay, then I am fine with this change. What testing you did for current changes? ------------- PR: https://git.openjdk.java.net/jdk/pull/2481 From sjohanss at openjdk.java.net Wed Feb 10 17:57:41 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 10 Feb 2021 17:57:41 GMT Subject: RFR: 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes In-Reply-To: <6bnV1cycpmBBCDixEal0b-BcuNRtKgYjr0oogXAzDNE=.3daa8c6e-9155-4fd8-a915-bd4fc157f94f@github.com> References: <6bnV1cycpmBBCDixEal0b-BcuNRtKgYjr0oogXAzDNE=.3daa8c6e-9155-4fd8-a915-bd4fc157f94f@github.com> Message-ID: On Wed, 10 Feb 2021 14:45:55 GMT, Thomas Stuefe wrote: > > ``` > void* os::reserve_xxx(size, ... blabla..., reservation_info_t* info = NULL); > ``` > > with reservation_info_t being a holder for information both "public" and opaque: e.g. whether this is executable memory (e.g. for MacOS MAP_JIT issue on committiong), the page size of course, as well as a way for platforms to piggyback internal information (eg. memory type used on AIX). > > But your solution sounds simpler, and its sufficient at least for now. So I don't think we work at cross purposes. > Yes, we'll see who gets around to making it happen first :) > > > There is also another possible simplification I was thinking about, which is to remove the "UseSHM" feature from Linux. I honestly do not know why we still need it. That would simplify rework of large page handling on Linux a lot. I did ask around in December: https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-December/046885.html , but did not get many answers. > > > > > > I know and I've been digging around in this area and agree even more now that it would be great to get rid of `UseSHM`. Not sure how if everybody agrees though and I'm currently working on small fix for `UseSHM` so that at least we don't leave it enabled everytime someone sets `+UseLargePages` without having any explicit large pages enabled ([PR#2488](https://github.com/openjdk/jdk/pull/2488)). > > Nice that you think the same. I am not sure many people are around which know the history. Maybe we should ask Andrew Haley, I believe he wrote some of that coding. > > I commented on your PR in your PR. Thanks for all your input, very helpful =) ------------- PR: https://git.openjdk.java.net/jdk/pull/2481 From sjohanss at openjdk.java.net Wed Feb 10 19:21:38 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 10 Feb 2021 19:21:38 GMT Subject: RFR: 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes In-Reply-To: References: <6bnV1cycpmBBCDixEal0b-BcuNRtKgYjr0oogXAzDNE=.3daa8c6e-9155-4fd8-a915-bd4fc157f94f@github.com> Message-ID: On Wed, 10 Feb 2021 17:55:00 GMT, Vladimir Kozlov wrote: > > I would prefer doing the change that records the page size i `ReservedSpace` as a separate patch and in that patch also remove all uses of `actual_reserved_page_size()`. Doing this change now is required to be able to integrate the [new test](https://github.com/openjdk/jdk/compare/master...kstefanj:test-for-trace-page-sizes) I mentioned, and I think it will be helpful for work in this area going forward. > > Are you talking about [8261230](https://bugs.openjdk.java.net/browse/JDK-8261230) to do recording? > Okay, then I am fine with this change. > No that change is also just fixing places where the traced page size is incorrect. Recording the page size in `ReservedSpace` will be handled by [JDK-8261527](https://bugs.openjdk.java.net/browse/JDK-8261527) going forward. > What testing you did for current changes? I've done a fair amount of local JTREG testing as well as manual testing to verify we get the correct result with and without explicit large pages enabled. Are there any specific compiler tests you know of that rely on the output from this tracing? I've also done mach5 tier 1-3 but we have few to none systems in there with explicit huge pages enabled, so that is mostly for sanity. ------------- PR: https://git.openjdk.java.net/jdk/pull/2481 From vlivanov at openjdk.java.net Wed Feb 10 21:28:39 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 10 Feb 2021 21:28:39 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v4] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Wed, 10 Feb 2021 06:58:57 GMT, Wang Huang wrote: >> JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: >> @Benchmark >> public void testMethod(Blackhole bh) { >> int sum = 0; >> for (int i = 0; i < data.length; i++) { >> Integer ii = Integer.valueOf(data[i]); >> if (i < data.length) { >> sum += ii.intValue(); >> } >> } >> bh.consume(sum); >> } >> Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. >> The uncommon_trap is generated by the optimized "if", because its condition is always true. >> >> We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, >> and deleting the use of box: >> >> There is no additional fail/error(s) of jtreg after this patch. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > delete useless line The improvement you are proposing is not specific to uncommon traps, but can be generalized to any debug usage at safepoints. The downside is that, in general, rematerialization logic has to use the corresponding pure function in order to materialize the eliminated instance. In this particular case (primitive boxing), it has to take into account the caching effects of primitive box factories. Otherwise, user code can encounter identity paradoxes with rematerialized primitive box instances. I don't see how the scalarization logic you propose preserves identity constraints imposed by `valueOf` factories. ------------- Changes requested by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2401 From xliu at openjdk.java.net Wed Feb 10 21:58:43 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 10 Feb 2021 21:58:43 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v4] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Wed, 10 Feb 2021 06:58:57 GMT, Wang Huang wrote: >> JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: >> @Benchmark >> public void testMethod(Blackhole bh) { >> int sum = 0; >> for (int i = 0; i < data.length; i++) { >> Integer ii = Integer.valueOf(data[i]); >> if (i < data.length) { >> sum += ii.intValue(); >> } >> } >> bh.consume(sum); >> } >> Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. >> The uncommon_trap is generated by the optimized "if", because its condition is always true. >> >> We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, >> and deleting the use of box: >> >> There is no additional fail/error(s) of jtreg after this patch. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > delete useless line src/hotspot/share/opto/callGenerator.cpp line 588: > 586: Node* sobj = new SafePointScalarObjectNode(gvn.type(res)->isa_oopptr(), > 587: #ifdef ASSERT > 588: (AllocateNode*)call, You can use call->isa_Allocate(); It utilizes node's ad-hoc RTTI to do type casting. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From kvn at openjdk.java.net Wed Feb 10 23:00:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 10 Feb 2021 23:00:45 GMT Subject: RFR: 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes In-Reply-To: References: Message-ID: <7bUn1l-RNEYH7LE-4NzMdUU1EVOVE3brf7KQaQ9lzL4=.bc261d90-cbbb-4a79-9765-ab0a886e1e2d@github.com> On Tue, 9 Feb 2021 13:45:38 GMT, Stefan Johansson wrote: > When adding a code heap the page size used with the underlying mapping is traced using `os::trace_page_sizes`. The old code tried to estimate the page-size based on the size, but the mapping has already been done so it is better to check the passed in `ReservedSpace`. Today we don't record the page size in the ReservedSpace, but we have a helper to do a good estimate: `ReservedSpace::actual_reserved_page_size()`. The proposal is to use this function. > > When changing this I also realized that the traced min-size used un-aligned value while the actual `initialize`-call correctly uses the aligned size. Changed so that we also use the aligned size for tracing. > > I'm currently doing some more work in this area and while I haven't added a specific test for this issue I have created a test I plan to integrate separately when a few more needed changes have gone in. The test is Linux-only and validates the output from `os::trace_page_sizes` against the information in `/proc/self/smaps`. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2481 From kvn at openjdk.java.net Wed Feb 10 23:21:43 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 10 Feb 2021 23:21:43 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v4] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Wed, 10 Feb 2021 21:26:17 GMT, Vladimir Ivanov wrote: > The improvement you are proposing is not specific to uncommon traps, but can be generalized to any debug usage at safepoints. > > The downside is that, in general, rematerialization logic has to use the corresponding pure function in order to materialize the eliminated instance. In this particular case (primitive boxing), it has to take into account the caching effects of primitive box factories. Otherwise, user code can encounter identity paradoxes with rematerialized primitive box instances. > > I don't see how the scalarization logic you propose preserves identity constraints imposed by `valueOf` factories. Yes, it seems this optimization introduces the issue we had with Graal (8223320): "C2 doesn't model Integer.valueOf() as anything special. It just inlines it. So the check that determines whether to allocate a new Integer or take one from the cache always happens at runtime. Graal models it as a BoxNode. It is correctly lowered, however, if it needs to be present in a JVM state, it is described as an allocation. So the decision whether to allocate or take the cached value has to happen during the deopt." There is code in deoptimizer for JVMCI which looks for cached Boxed values. We may need to adopt it for C2 EA for this optimization to work. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From github.com+2249648+johntortugo at openjdk.java.net Thu Feb 11 05:12:55 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Thu, 11 Feb 2021 05:12:55 GMT Subject: RFR: 8241502: Migrate x86_64.ad to MacroAssembler [v3] In-Reply-To: References: Message-ID: > Relates to: https://bugs.openjdk.java.net/browse/JDK-8241502 > Tested on: Linux tier1, 2 and 3 > > Can you please take a look whether these changes are going in the direction expected or not? If it is, I'll continue working on the `JDK-8241502` but I'd like to split it in a few PRs since it's a lot of changes. John Tortugo has updated the pull request incrementally with one additional commit since the last revision: Third part of conversions. Small fix in Assembler::cmovl. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2420/files - new: https://git.openjdk.java.net/jdk/pull/2420/files/25824fde..1e8361cc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2420&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2420&range=01-02 Stats: 156 lines in 2 files changed: 46 ins; 13 del; 97 mod Patch: https://git.openjdk.java.net/jdk/pull/2420.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2420/head:pull/2420 PR: https://git.openjdk.java.net/jdk/pull/2420 From github.com+2249648+johntortugo at openjdk.java.net Thu Feb 11 05:15:37 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Thu, 11 Feb 2021 05:15:37 GMT Subject: RFR: 8241502: Migrate x86_64.ad to MacroAssembler In-Reply-To: References: <0azhJ4pD5Tq_lkpPtYMpQBjokflcSQEdWP2Rz9HBm6k=.c3ece6fd-1ae7-49ea-a6eb-ec88a9fbd54d@github.com> Message-ID: On Tue, 9 Feb 2021 01:35:35 GMT, Dean Long wrote: >> I am curious if the x86_64.o file changes in any significant way (speed of size). > > I wish there was a way for the old and new versions to co-exist at the same time, so we could generate the code the old way and and the new way, then compare, for automatic verification of the MacroAssember version. Thank you all for the feedback! @iklam - I'll check that and let you know once I make more conversions. @dean-long - That would be great. I'm all ears for the best way to test this! ------------- PR: https://git.openjdk.java.net/jdk/pull/2420 From thartmann at openjdk.java.net Thu Feb 11 07:47:37 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Feb 2021 07:47:37 GMT Subject: RFR: 8240281: Remove failing assertion code when selecting first memory state in SuperWord::co_locate_pack In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 08:49:10 GMT, Christian Hagedorn wrote: > While working on [JDK-8238438](https://bugs.openjdk.java.net/browse/JDK-8238438), it was not clear if there is a case where `SuperWord::co_locate_pack()` should pick the memory state of the first load of a pack. We could not find an example and therefore added an `assert(false)` and left the code there to clean it up at some point if the assert was never hit. > > Now, a newly found fuzzer test showed that are cases where we need to pick the first memory state, triggering the `assert(false)`. In the test case `test()`, a store and a load pack are created but the store pack is filtered in `SuperWord::filter_packs()`. The load `x += iArrFld[j]` must read the old value before the store `iArrFld[j] = j` overrides it. Therefore, the load vector must be executed before any of the stores to `iArrFld[j]`. This, however, is not the case if we pick the memory state of the last load which results in wrong values for `iArrFld`: The stores and the loads are dependent and some of the stores are already executed before the load vector. > > The fix is to remove the assertion code added by JDK-8238438 and keep the code for selecting the first memory state of a load pack. > > Thanks, > Christian Looks good. Great that we finally have a test for this case! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2495 From thartmann at openjdk.java.net Thu Feb 11 07:49:39 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Feb 2021 07:49:39 GMT Subject: RFR: 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 13:45:38 GMT, Stefan Johansson wrote: > When adding a code heap the page size used with the underlying mapping is traced using `os::trace_page_sizes`. The old code tried to estimate the page-size based on the size, but the mapping has already been done so it is better to check the passed in `ReservedSpace`. Today we don't record the page size in the ReservedSpace, but we have a helper to do a good estimate: `ReservedSpace::actual_reserved_page_size()`. The proposal is to use this function. > > When changing this I also realized that the traced min-size used un-aligned value while the actual `initialize`-call correctly uses the aligned size. Changed so that we also use the aligned size for tracing. > > I'm currently doing some more work in this area and while I haven't added a specific test for this issue I have created a test I plan to integrate separately when a few more needed changes have gone in. The test is Linux-only and validates the output from `os::trace_page_sizes` against the information in `/proc/self/smaps`. Looks good to me, thanks for fixing! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2481 From thartmann at openjdk.java.net Thu Feb 11 07:54:38 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Feb 2021 07:54:38 GMT Subject: RFR: 8259430: C2: assert(in_vt->length() == out_vt->length()) failed: mismatch on number of elements In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 12:57:11 GMT, Vladimir Ivanov wrote: > Another problem caused by pathological cases (in effectively dead code): `VectorUnboxNode::Ideal()/Value()` ignore cast nodes (even the ones carrying control dependency) to reveal `VectorBox` and sometimes it exposes type mismatches between box/unbox operations which are impossible in practice. > > Proposed fix turns the assert into a runtime check to ignore problematic IR shape. Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2353 From thartmann at openjdk.java.net Thu Feb 11 07:55:38 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Feb 2021 07:55:38 GMT Subject: RFR: 8261250: Dependencies: Remove unused dependency types In-Reply-To: <2-SBc7bcjQnndYbBWgQFNs4nY_OjM-5_E5f3g2-OEiA=.d235e241-adc8-4f51-8bf9-66915f068a95@github.com> References: <2-SBc7bcjQnndYbBWgQFNs4nY_OjM-5_E5f3g2-OEiA=.d235e241-adc8-4f51-8bf9-66915f068a95@github.com> Message-ID: On Fri, 5 Feb 2021 17:58:06 GMT, Vladimir Ivanov wrote: > Remove support of unused dependency types from Dependencies. > > Testing: > - [x] hs-precheckin-comp, hs-tier1, hs-tier2. Nice cleanup, looks good! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2431 From jbhateja at openjdk.java.net Thu Feb 11 08:38:49 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 11 Feb 2021 08:38:49 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction Message-ID: BMI2 BHZI instruction can be used to optimize the instruction sequence used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. ------------- Commit messages: - 8261553: Efficient mask generation using BMI2 BZHI instruction. Changes: https://git.openjdk.java.net/jdk/pull/2522/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2522&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261553 Stats: 30 lines in 5 files changed: 8 ins; 11 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/2522.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2522/head:pull/2522 PR: https://git.openjdk.java.net/jdk/pull/2522 From vlivanov at openjdk.java.net Thu Feb 11 10:19:40 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 11 Feb 2021 10:19:40 GMT Subject: RFR: 8259430: C2: assert(in_vt->length() == out_vt->length()) failed: mismatch on number of elements In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 07:51:54 GMT, Tobias Hartmann wrote: >> Another problem caused by pathological cases (in effectively dead code): `VectorUnboxNode::Ideal()/Value()` ignore cast nodes (even the ones carrying control dependency) to reveal `VectorBox` and sometimes it exposes type mismatches between box/unbox operations which are impossible in practice. >> >> Proposed fix turns the assert into a runtime check to ignore problematic IR shape. > > Looks reasonable to me. Thanks for the reviews, Vladimir and Tobias. ------------- PR: https://git.openjdk.java.net/jdk/pull/2353 From vlivanov at openjdk.java.net Thu Feb 11 10:19:39 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 11 Feb 2021 10:19:39 GMT Subject: RFR: 8261250: Dependencies: Remove unused dependency types In-Reply-To: References: <2-SBc7bcjQnndYbBWgQFNs4nY_OjM-5_E5f3g2-OEiA=.d235e241-adc8-4f51-8bf9-66915f068a95@github.com> Message-ID: On Thu, 11 Feb 2021 07:53:08 GMT, Tobias Hartmann wrote: >> Remove support of unused dependency types from Dependencies. >> >> Testing: >> - [x] hs-precheckin-comp, hs-tier1, hs-tier2. > > Nice cleanup, looks good! Thanks for the reviews, Vladimir and Tobias. ------------- PR: https://git.openjdk.java.net/jdk/pull/2431 From vlivanov at openjdk.java.net Thu Feb 11 10:19:41 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 11 Feb 2021 10:19:41 GMT Subject: Integrated: 8259430: C2: assert(in_vt->length() == out_vt->length()) failed: mismatch on number of elements In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 12:57:11 GMT, Vladimir Ivanov wrote: > Another problem caused by pathological cases (in effectively dead code): `VectorUnboxNode::Ideal()/Value()` ignore cast nodes (even the ones carrying control dependency) to reveal `VectorBox` and sometimes it exposes type mismatches between box/unbox operations which are impossible in practice. > > Proposed fix turns the assert into a runtime check to ignore problematic IR shape. This pull request has now been integrated. Changeset: 3ede231d Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/3ede231d Stats: 23 lines in 2 files changed: 6 ins; 2 del; 15 mod 8259430: C2: assert(in_vt->length() == out_vt->length()) failed: mismatch on number of elements Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2353 From vlivanov at openjdk.java.net Thu Feb 11 10:19:41 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 11 Feb 2021 10:19:41 GMT Subject: Integrated: 8261250: Dependencies: Remove unused dependency types In-Reply-To: <2-SBc7bcjQnndYbBWgQFNs4nY_OjM-5_E5f3g2-OEiA=.d235e241-adc8-4f51-8bf9-66915f068a95@github.com> References: <2-SBc7bcjQnndYbBWgQFNs4nY_OjM-5_E5f3g2-OEiA=.d235e241-adc8-4f51-8bf9-66915f068a95@github.com> Message-ID: On Fri, 5 Feb 2021 17:58:06 GMT, Vladimir Ivanov wrote: > Remove support of unused dependency types from Dependencies. > > Testing: > - [x] hs-precheckin-comp, hs-tier1, hs-tier2. This pull request has now been integrated. Changeset: a9c36805 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/a9c36805 Stats: 253 lines in 2 files changed: 1 ins; 238 del; 14 mod 8261250: Dependencies: Remove unused dependency types Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2431 From redestad at openjdk.java.net Thu Feb 11 10:30:38 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 11 Feb 2021 10:30:38 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 08:31:40 GMT, Jatin Bhateja wrote: > BMI2 BHZI instruction can be used to optimize the instruction sequence > used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. - Rather than removing the old code, I believe the code calling bzhiq needs to be in a branch checking `VM_Version::supports_bmi2`. Otherwise you'll hit asserts on older hardware without this extension - Some demonstration of the performance benefit would be nice - either a new microbenchmark or a statistically significant result running some existing ones, e.g. `make test TEST=micro:ArrayCopy` ------------- Changes requested by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2522 From eirbjo at gmail.com Thu Feb 11 11:10:37 2021 From: eirbjo at gmail.com (=?UTF-8?B?RWlyaWsgQmrDuHJzbsO4cw==?=) Date: Thu, 11 Feb 2021 12:10:37 +0100 Subject: C1 crash in LinearScan::eliminate_spill_moves In-Reply-To: References: Message-ID: > > Why would there be 131073 indexes? Seems a tad excessive? > Some further analysis reveals what I think leads to the pathological case. It's somehow tripped by replacing RETURNs with GOTOs First a bit of context: The agent instruments methods by injecting a local variable for each unique line number of the line number table using a pair or ICONST_0, ISTORE N for each line N: ICONST_0 ISTORE 1 [..] ICONST_0 ISTORE N This counter is then incremented at each line entry using IINC: IINC 1 1 [..] IINC N 1 Finally, before each RETURN we report each line count by calling out ILOAD X/ INVOKESTATIC (X): ILOAD 1 INVOKESTATIC com/github/eirbjo/JDK8261235Reproducer.countVisits (I)V [..] ILOAD N INVOKESTATIC com/github/eirbjo/JDK8261235Reproducer.countVisits (I)V This works nicely for methods with one (or a few) RETURNs. For methods with many returns, the byte code size grows dramatically because the (ILOAD X, INVOKESTATIC X) sequence is repeated for every RETURN. (In the agent actually uses INVOKEVIRTUAL instead, and the method takes the line number as an additional argument, adding an extra bycode code to load or DUP the receiver object and a ICONST/BIPUSH/SIPUSH to put the line number on the stack). Our example method (org.jaxen.saxpath.base.Verifier::isXMLLetter) has 206 line numbers and 358 IRETURNs. Each ILOAD, INVOKESTATIC consumes 2 + 3 = 5 bytes. For isXMLLetter, that means 1040 bytes per RETURN, adding up to a total of 372320 bytes for the reporting part, bringing us well above the 64K byte code limit. By replacing each RETURN with a GOTO, we can instead inject the 1040 bytes only once, which is a considerable saving. This of course changes the control flow graph of the method considerably, which is what I think might be tripping up C1's register allocation. I used -XX:TraceLinearScanLevel=1 to compare the compilations (without the reporting for the RETURN case, since it would hit the 64K roof). Here's a compilation using RETURNs: 518 93 2 org.jaxen.saxpath.base.Verifier::isXMLLetter (5414 bytes) ----- linear-scan block order: 0: B716 loop: -1 depth: 0 dom: NULL sux: B717 1: B717 loop: -1 depth: 0 dom: B716 preds: B716 sux: B0 2: B 0 loop: -1 depth: 0 dom: B717 preds: B717 sux: B2 B1 [..] 716: B 1 loop: -1 depth: 0 dom: B0 preds: B0 Compare this with a GOTO compilation: 437 95 3 org.jaxen.saxpath.base.Verifier::isXMLLetter (6130 bytes) ----- linear-scan block order: 0: B1106 loop: -1 depth: 0 dom: NULL sux: B1107 1: B1107 loop: -1 depth: 0 dom: B1106 preds: B1106 sux: B0 2: B 0 loop: -1 depth: 0 dom: B1107 preds: B1107 sux: B2 B1 [..] 716: B 1 loop: -1 depth: 0 dom: B0 preds: B0 sux: B3 717: B 3 loop: -1 depth: 0 dom: B0 preds: B1 B4 B6 B8 B10 B12 B14 B16 B18 B20 B22 B24 B26 B28 B30 B32 B34 B36 B38 B40 B42 B44 B46 B48 B50 B52 B54 B56 B58 B60 B62 B64 B66 B68 B70 B72 B74 B76 B78 B80 B82 B84 B86 B88 B90 B92 B94 B96 B98 B100 B102 B104 B106 B108 B110 B112 B114 B116 B118 B120 B122 B124 B126 B128 B130 B132 B134 B136 B138 B140 B142 B144 B146 B148 B150 B152 B154 B156 B158 B160 B162 B164 B166 B168 B170 B172 B174 B176 B178 B180 B182 B184 B186 B188 B190 B192 B194 B196 B198 B200 B202 B204 B206 B208 B210 B212 B214 B216 B218 B220 B222 B224 B226 B228 B230 B232 B234 B236 B238 B240 B242 B244 B246 B248 B250 B252 B254 B256 B258 B260 B262 B264 B266 B268 B270 B272 B274 B276 B278 B280 B282 B284 B286 B288 B290 B292 B294 B296 B298 B300 B302 B304 B306 B308 B310 B312 B314 B316 B318 B320 B322 B324 B326 B328 B330 B332 B334 B336 B338 B340 B342 B344 B346 B348 B350 B352 B354 B356 B358 B360 B362 B364 B366 B368 B370 B372 B374 B376 B378 B380 B382 B384 B386 B388 B390 B392 B394 B396 B398 B400 B402 B404 B406 B408 B410 B412 B414 B416 B418 B420 B422 B424 B426 B428 B430 B432 B434 B436 B438 B440 B442 B444 B446 B448 B450 B452 B454 B456 B458 B460 B462 B464 B466 B468 B470 B472 B474 B476 B478 B480 B482 B484 B486 B488 B490 B492 B494 B496 B498 B500 B502 B504 B506 B508 B510 B512 B514 B516 B518 B520 B522 B524 B526 B528 B530 B532 B534 B536 B538 B540 B542 B544 B546 B548 B550 B552 B554 B556 B558 B560 B562 B564 B566 B568 B570 B572 B574 B576 B578 B580 B582 B584 B586 B588 B590 B592 B594 B596 B598 B600 B602 B604 B606 B608 B610 B612 B614 B616 B618 B620 B622 B624 B626 B628 B630 B632 B634 B636 B638 B640 B642 B644 B646 B648 B650 B652 B654 B656 B658 B660 B662 B664 B666 B668 B670 B672 B674 B676 B678 B680 B682 B684 B686 B688 B690 B692 B694 B696 B698 B700 B702 B704 B706 B708 B710 B712 B714 B715 Notice the 717 line with 715 'preds'? The next part of the trace shows 'Before Register Allocation', which looks innocuous until we get to the 714 block: B714 [5146, 5147] preds: B713 sux: B3 __id_Instruction___________________________________________ 6454 label [label:0x00007fb855027ae0] 6456 move [metadata:0x0000000118878000|M] [R1643|M] 6458 add [Base:[R1643|M] Disp: 20320|J] [int:1|I] [Base:[R1643|M] Disp: 20320|J] 6460 move [int:1|I] [R1838|I] 6462 move [int:1|I] [R1837|I] [..] // move repeated 191 times! 6844 move [int:1|I] [R1646|I] 6846 move [int:1|I] [R1645|I] 6848 move [int:1|I] [R1644|I] 6850 branch [AL] [B3] This then repeats for B715 and we get to B712: B712 [5136, 5137] preds: B711 sux: B3 __id_Instruction___________________________________________ 7250 label [label:0x00007fb8550274e0] 7252 move [metadata:0x0000000118878000|M] [R1840|M] 7254 add [Base:[R1840|M] Disp: 20264|J] [int:1|I] [Base:[R1840|M] Disp: 20264|J] 7256 move [int:1|I] [R1838|I] 7258 move [int:1|I] [R1837|I] 7260 move [int:1|I] [R1836|I] And then in inverse order: B710 [5123, 5124] preds: B709 sux: B3 (lots of moves) B708 [5113, 5114] preds: B707 sux: B3 (lots of moves) (continues for 4MBs) B1 [624, 625] preds: B0 sux: B3 (lots of moves) Finally, we get to B3 which actually seems to show some real code: B3 [5157, 6129] preds: B1 B4 B6 B8 B10 B12 B14 B16 B18 B20 B22 B24 B26 B28 B30 B32 B34 B36 B38 B40 B42 B44 B46 B48 B50 B52 B54 B56 B58 B60 B62 B64 B66 B68 B70 B72 B74 B76 B78 B80 B82 B84 B86 B88 B90 B92 B94 B96 B98 B100 B102 B104 B106 B108 B110 B112 B114 B116 B118 B120 B122 B124 B126 B128 B130 B132 B134 B136 B138 B140 B142 B144 B146 B148 B150 B152 B154 B156 B158 B160 B162 B164 B166 B168 B170 B172 B174 B176 B178 B180 B182 B184 B186 B188 B190 B192 B194 B196 B198 B200 B202 B204 B206 B208 B210 B212 B214 B216 B218 B220 B222 B224 B226 B228 B230 B232 B234 B236 B238 B240 B242 B244 B246 B248 B250 B252 B254 B256 B258 B260 B262 B264 B266 B268 B270 B272 B274 B276 B278 B280 B282 B284 B286 B288 B290 B292 B294 B296 B298 B300 B302 B304 B306 B308 B310 B312 B314 B316 B318 B320 B322 B324 B326 B328 B330 B332 B334 B336 B338 B340 B342 B344 B346 B348 B350 B352 B354 B356 B358 B360 B362 B364 B366 B368 B370 B372 B374 B376 B378 B380 B382 B384 B386 B388 B390 B392 B394 B396 B398 B400 B402 B404 B406 B408 B410 B412 B414 B416 B418 B420 B422 B424 B426 B428 B430 B432 B434 B436 B438 B440 B442 B444 B446 B448 B450 B452 B454 B456 B458 B460 B462 B464 B466 B468 B470 B472 B474 B476 B478 B480 B482 B484 B486 B488 B490 B492 B494 B496 B498 B500 B502 B504 B506 B508 B510 B512 B514 B516 B518 B520 B522 B524 B526 B528 B530 B532 B534 B536 B538 B540 B542 B544 B546 B548 B550 B552 B554 B556 B558 B560 B562 B564 B566 B568 B570 B572 B574 B576 B578 B580 B582 B584 B586 B588 B590 B592 B594 B596 B598 B600 B602 B604 B606 B608 B610 B612 B614 B616 B618 B620 B622 B624 B626 B628 B630 B632 B634 B636 B638 B640 B642 B644 B646 B648 B650 B652 B654 B656 B658 B660 B662 B664 B666 B668 B670 B672 B674 B676 B678 B680 B682 B684 B686 B688 B690 B692 B694 B696 B698 B700 B702 B704 B706 B708 B710 B712 B714 B715 __id_Instruction___________________________________________ 148938 label [label:0x00007fb854776bd0] 148940 profile_call isXMLLetter.org/jaxen/saxpath/base/Verifier @ 5158 [R2196|M] [R2197|J] 148942 move [metadata:0x000000011886d478|M] [R2198|M] 148944 move [Base:[R2198|M] Disp: 284|I] [R2199|I] 148946 add [R2199|I] [int:2|I] [R2199|I] 148948 move [R2199|I] [Base:[R2198|M] Disp: 284|I] 148950 logic_and [R2199|I] [int:2097150|I] [R2199|I] 148952 cmp [EQ] [R2199|I] [int:0|I] 148954 branch [EQ] [CounterOverflowStub: 0x00007fb85715f4d0] [etc..]. The output then continues with a wall of text which makes a beautiful pattern if I get my terminal width exactly right, but otherwise makes no sense to me :-) Here's hoping that this analysis may be of some use. Cheers, Eirik. From jbhateja at openjdk.java.net Thu Feb 11 12:25:53 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 11 Feb 2021 12:25:53 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: <7AXx69j_wscf8ENt98_apHTd1OKbeO80nNVpU68Z794=.7d776104-0f08-4e92-b229-bc210123fc4e@github.com> > BMI2 BHZI instruction can be used to optimize the instruction sequence > used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8261553: Adding BMI2 missing check for partial in-lining. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2522/files - new: https://git.openjdk.java.net/jdk/pull/2522/files/38495aec..84c9c2da Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2522&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2522&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2522.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2522/head:pull/2522 PR: https://git.openjdk.java.net/jdk/pull/2522 From jbhateja at openjdk.java.net Thu Feb 11 12:25:53 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 11 Feb 2021 12:25:53 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 10:28:05 GMT, Claes Redestad wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8261553: Adding BMI2 missing check for partial in-lining. > > - Rather than removing the old code, I believe the code calling bzhiq needs to be in a branch checking `VM_Version::supports_bmi2`. Otherwise you'll hit asserts on older hardware without this extension > - Some demonstration of the performance benefit would be nice - either a new microbenchmark or a statistically significant result running some existing ones, e.g. `make test TEST=micro:ArrayCopy` Hi Claes, Here is the JMH performance data over CLX for arraycopy benchmarks: http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_BASELINE.txt http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_WITH_OPTS.txt Regards, Jatin ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From jbhateja at openjdk.java.net Thu Feb 11 12:33:38 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 11 Feb 2021 12:33:38 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 10:28:05 GMT, Claes Redestad wrote: > * Rather than removing the old code, I believe the code calling bzhiq needs to be in a branch checking `VM_Version::supports_bmi2`. Otherwise you'll hit asserts on older hardware without this extension Hi Claes, added missing safely check for BMI2, its in general rare that a target supporting AVX-512 does not support BMI2 > * Some demonstration of the performance benefit would be nice - either a new microbenchmark or a statistically significant result running some existing ones, e.g. `make test TEST=micro:ArrayCopy` ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From christian.hagedorn at oracle.com Thu Feb 11 12:56:02 2021 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 11 Feb 2021 13:56:02 +0100 Subject: C1 crash in LinearScan::eliminate_spill_moves In-Reply-To: References: Message-ID: <1a7cc007-8de7-5e86-7f34-0d2a17ea847b@oracle.com> Hi Eirik Thanks for investigating further. I had a look at it and I think we are just running out of virtual registers in the MoveResolver but do not handle it (thus the virtual register number overflow). We should probably just bail out in this case. We can handle it similarly as in LIRGenerator::new_register() where we also bail out if we need too many virtual registers. I'm working on such a fix for it. Best regards, Christian On 11.02.21 12:10, Eirik Bj?rsn?s wrote: >> Why would there be 131073 indexes? Seems a tad excessive? >> > Some further analysis reveals what I think leads to the pathological case. > It's somehow tripped by replacing RETURNs with GOTOs > > First a bit of context: > > The agent instruments methods by injecting a local variable for each unique > line number of the line number table using a pair or ICONST_0, ISTORE N for > each line N: > > ICONST_0 > ISTORE 1 > [..] > ICONST_0 > ISTORE N > > This counter is then incremented at each line entry using IINC: > > IINC 1 1 > [..] > IINC N 1 > > Finally, before each RETURN we report each line count by calling out ILOAD > X/ INVOKESTATIC (X): > > ILOAD 1 > INVOKESTATIC com/github/eirbjo/JDK8261235Reproducer.countVisits (I)V > [..] > ILOAD N > INVOKESTATIC com/github/eirbjo/JDK8261235Reproducer.countVisits (I)V > > This works nicely for methods with one (or a few) RETURNs. For methods with > many returns, the byte code size grows dramatically because the (ILOAD X, > INVOKESTATIC X) sequence is repeated for every RETURN. (In the agent > actually uses INVOKEVIRTUAL instead, and the method takes the line number > as an additional argument, adding an extra bycode code to load or DUP the > receiver object and a ICONST/BIPUSH/SIPUSH to put the line number on the > stack). > > Our example method (org.jaxen.saxpath.base.Verifier::isXMLLetter) has 206 > line numbers and 358 IRETURNs. > > Each ILOAD, INVOKESTATIC consumes 2 + 3 = 5 bytes. For isXMLLetter, that > means 1040 bytes per RETURN, adding up to a total of 372320 bytes for the > reporting part, bringing us well above the 64K byte code limit. > > By replacing each RETURN with a GOTO, we can instead inject the 1040 bytes > only once, which is a considerable saving. > > This of course changes the control flow graph of the method considerably, > which is what I think might be tripping up C1's register allocation. > > I used -XX:TraceLinearScanLevel=1 to compare the compilations (without the > reporting for the RETURN case, since it would hit the 64K roof). > > Here's a compilation using RETURNs: > > 518 93 2 org.jaxen.saxpath.base.Verifier::isXMLLetter (5414 > bytes) > ----- linear-scan block order: > 0: B716 loop: -1 depth: 0 dom: NULL sux: B717 > 1: B717 loop: -1 depth: 0 dom: B716 preds: B716 > sux: B0 > 2: B 0 loop: -1 depth: 0 dom: B717 preds: B717 > sux: B2 B1 > [..] > 716: B 1 loop: -1 depth: 0 dom: B0 preds: B0 > > Compare this with a GOTO compilation: > > 437 95 3 org.jaxen.saxpath.base.Verifier::isXMLLetter (6130 > bytes) > ----- linear-scan block order: > 0: B1106 loop: -1 depth: 0 dom: NULL sux: B1107 > 1: B1107 loop: -1 depth: 0 dom: B1106 preds: > B1106 sux: B0 > 2: B 0 loop: -1 depth: 0 dom: B1107 preds: B1107 > sux: B2 B1 > [..] > 716: B 1 loop: -1 depth: 0 dom: B0 preds: B0 > sux: B3 > 717: B 3 loop: -1 depth: 0 dom: B0 preds: B1 B4 B6 > B8 B10 B12 B14 B16 B18 B20 B22 B24 B26 B28 B30 B32 B34 B36 B38 B40 B42 B44 > B46 B48 B50 B52 B54 B56 B58 B60 B62 B64 B66 B68 B70 B72 B74 B76 B78 B80 B82 > B84 B86 B88 B90 B92 B94 B96 B98 B100 B102 B104 B106 B108 B110 B112 B114 > B116 B118 B120 B122 B124 B126 B128 B130 B132 B134 B136 B138 B140 B142 B144 > B146 B148 B150 B152 B154 B156 B158 B160 B162 B164 B166 B168 B170 B172 B174 > B176 B178 B180 B182 B184 B186 B188 B190 B192 B194 B196 B198 B200 B202 B204 > B206 B208 B210 B212 B214 B216 B218 B220 B222 B224 B226 B228 B230 B232 B234 > B236 B238 B240 B242 B244 B246 B248 B250 B252 B254 B256 B258 B260 B262 B264 > B266 B268 B270 B272 B274 B276 B278 B280 B282 B284 B286 B288 B290 B292 B294 > B296 B298 B300 B302 B304 B306 B308 B310 B312 B314 B316 B318 B320 B322 B324 > B326 B328 B330 B332 B334 B336 B338 B340 B342 B344 B346 B348 B350 B352 B354 > B356 B358 B360 B362 B364 B366 B368 B370 B372 B374 B376 B378 B380 B382 B384 > B386 B388 B390 B392 B394 B396 B398 B400 B402 B404 B406 B408 B410 B412 B414 > B416 B418 B420 B422 B424 B426 B428 B430 B432 B434 B436 B438 B440 B442 B444 > B446 B448 B450 B452 B454 B456 B458 B460 B462 B464 B466 B468 B470 B472 B474 > B476 B478 B480 B482 B484 B486 B488 B490 B492 B494 B496 B498 B500 B502 B504 > B506 B508 B510 B512 B514 B516 B518 B520 B522 B524 B526 B528 B530 B532 B534 > B536 B538 B540 B542 B544 B546 B548 B550 B552 B554 B556 B558 B560 B562 B564 > B566 B568 B570 B572 B574 B576 B578 B580 B582 B584 B586 B588 B590 B592 B594 > B596 B598 B600 B602 B604 B606 B608 B610 B612 B614 B616 B618 B620 B622 B624 > B626 B628 B630 B632 B634 B636 B638 B640 B642 B644 B646 B648 B650 B652 B654 > B656 B658 B660 B662 B664 B666 B668 B670 B672 B674 B676 B678 B680 B682 B684 > B686 B688 B690 B692 B694 B696 B698 B700 B702 B704 B706 B708 B710 B712 B714 > B715 > > Notice the 717 line with 715 'preds'? > > The next part of the trace shows 'Before Register Allocation', which looks > innocuous until we get to the 714 block: > > B714 [5146, 5147] preds: B713 sux: B3 > __id_Instruction___________________________________________ > 6454 label [label:0x00007fb855027ae0] > 6456 move [metadata:0x0000000118878000|M] [R1643|M] > 6458 add [Base:[R1643|M] Disp: 20320|J] [int:1|I] [Base:[R1643|M] Disp: > 20320|J] > 6460 move [int:1|I] [R1838|I] > 6462 move [int:1|I] [R1837|I] > > [..] // move repeated 191 times! > > 6844 move [int:1|I] [R1646|I] > 6846 move [int:1|I] [R1645|I] > 6848 move [int:1|I] [R1644|I] > 6850 branch [AL] [B3] > > This then repeats for B715 and we get to B712: > > B712 [5136, 5137] preds: B711 sux: B3 > __id_Instruction___________________________________________ > 7250 label [label:0x00007fb8550274e0] > 7252 move [metadata:0x0000000118878000|M] [R1840|M] > 7254 add [Base:[R1840|M] Disp: 20264|J] [int:1|I] [Base:[R1840|M] Disp: > 20264|J] > 7256 move [int:1|I] [R1838|I] > 7258 move [int:1|I] [R1837|I] > 7260 move [int:1|I] [R1836|I] > > And then in inverse order: > > B710 [5123, 5124] preds: B709 sux: B3 > (lots of moves) > > B708 [5113, 5114] preds: B707 sux: B3 > (lots of moves) > (continues for 4MBs) > > B1 [624, 625] preds: B0 sux: B3 > (lots of moves) > > Finally, we get to B3 which actually seems to show some real code: > > B3 [5157, 6129] preds: B1 B4 B6 B8 B10 B12 B14 B16 B18 B20 B22 B24 B26 B28 > B30 B32 B34 B36 B38 B40 B42 B44 B46 B48 B50 B52 B54 B56 B58 B60 B62 B64 B66 > B68 B70 B72 B74 B76 B78 B80 B82 B84 B86 B88 B90 B92 B94 B96 B98 B100 B102 > B104 B106 B108 B110 B112 B114 B116 B118 B120 B122 B124 B126 B128 B130 B132 > B134 B136 B138 B140 B142 B144 B146 B148 B150 B152 B154 B156 B158 B160 B162 > B164 B166 B168 B170 B172 B174 B176 B178 B180 B182 B184 B186 B188 B190 B192 > B194 B196 B198 B200 B202 B204 B206 B208 B210 B212 B214 B216 B218 B220 B222 > B224 B226 B228 B230 B232 B234 B236 B238 B240 B242 B244 B246 B248 B250 B252 > B254 B256 B258 B260 B262 B264 B266 B268 B270 B272 B274 B276 B278 B280 B282 > B284 B286 B288 B290 B292 B294 B296 B298 B300 B302 B304 B306 B308 B310 B312 > B314 B316 B318 B320 B322 B324 B326 B328 B330 B332 B334 B336 B338 B340 B342 > B344 B346 B348 B350 B352 B354 B356 B358 B360 B362 B364 B366 B368 B370 B372 > B374 B376 B378 B380 B382 B384 B386 B388 B390 B392 B394 B396 B398 B400 B402 > B404 B406 B408 B410 B412 B414 B416 B418 B420 B422 B424 B426 B428 B430 B432 > B434 B436 B438 B440 B442 B444 B446 B448 B450 B452 B454 B456 B458 B460 B462 > B464 B466 B468 B470 B472 B474 B476 B478 B480 B482 B484 B486 B488 B490 B492 > B494 B496 B498 B500 B502 B504 B506 B508 B510 B512 B514 B516 B518 B520 B522 > B524 B526 B528 B530 B532 B534 B536 B538 B540 B542 B544 B546 B548 B550 B552 > B554 B556 B558 B560 B562 B564 B566 B568 B570 B572 B574 B576 B578 B580 B582 > B584 B586 B588 B590 B592 B594 B596 B598 B600 B602 B604 B606 B608 B610 B612 > B614 B616 B618 B620 B622 B624 B626 B628 B630 B632 B634 B636 B638 B640 B642 > B644 B646 B648 B650 B652 B654 B656 B658 B660 B662 B664 B666 B668 B670 B672 > B674 B676 B678 B680 B682 B684 B686 B688 B690 B692 B694 B696 B698 B700 B702 > B704 B706 B708 B710 B712 B714 B715 > __id_Instruction___________________________________________ > 148938 label [label:0x00007fb854776bd0] > 148940 profile_call isXMLLetter.org/jaxen/saxpath/base/Verifier @ 5158 > [R2196|M] [R2197|J] > 148942 move [metadata:0x000000011886d478|M] [R2198|M] > 148944 move [Base:[R2198|M] Disp: 284|I] [R2199|I] > 148946 add [R2199|I] [int:2|I] [R2199|I] > 148948 move [R2199|I] [Base:[R2198|M] Disp: 284|I] > 148950 logic_and [R2199|I] [int:2097150|I] [R2199|I] > 148952 cmp [EQ] [R2199|I] [int:0|I] > 148954 branch [EQ] [CounterOverflowStub: 0x00007fb85715f4d0] > [etc..]. > > > The output then continues with a wall of text which makes a beautiful > pattern if I get my terminal width exactly right, but otherwise makes no > sense to me :-) > > Here's hoping that this analysis may be of some use. > > Cheers, > Eirik. From redestad at openjdk.java.net Thu Feb 11 12:59:39 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 11 Feb 2021 12:59:39 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 12:22:29 GMT, Jatin Bhateja wrote: > Hi Claes, > > Here is the JMH performance data over CLX for arraycopy benchmarks: > http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_BASELINE.txt > http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_WITH_OPTS.txt > > Regards, > Jatin Thanks! Eyeballing the results it looks like a mixed bag. There even seems to be a few regressions such as this: o.o.b.java.lang.ArrayCopyUnalignedSrc.testLong 1200 N/A avgt 2 61.663 ns/op --> o.o.b.java.lang.ArrayCopyUnalignedSrc.testLong 1200 N/A avgt 2 74.160 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From chagedorn at openjdk.java.net Thu Feb 11 13:03:38 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 11 Feb 2021 13:03:38 GMT Subject: RFR: 8240281: Remove failing assertion code when selecting first memory state in SuperWord::co_locate_pack In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 07:44:30 GMT, Tobias Hartmann wrote: >> While working on [JDK-8238438](https://bugs.openjdk.java.net/browse/JDK-8238438), it was not clear if there is a case where `SuperWord::co_locate_pack()` should pick the memory state of the first load of a pack. We could not find an example and therefore added an `assert(false)` and left the code there to clean it up at some point if the assert was never hit. >> >> Now, a newly found fuzzer test showed that are cases where we need to pick the first memory state, triggering the `assert(false)`. In the test case `test()`, a store and a load pack are created but the store pack is filtered in `SuperWord::filter_packs()`. The load `x += iArrFld[j]` must read the old value before the store `iArrFld[j] = j` overrides it. Therefore, the load vector must be executed before any of the stores to `iArrFld[j]`. This, however, is not the case if we pick the memory state of the last load which results in wrong values for `iArrFld`: The stores and the loads are dependent and some of the stores are already executed before the load vector. >> >> The fix is to remove the assertion code added by JDK-8238438 and keep the code for selecting the first memory state of a load pack. >> >> Thanks, >> Christian > > Looks good. Great that we finally have a test for this case! Thanks Vladimir and Tobias for your reviews! Yes, it's good to finally have a testcase for it. Was already thinking about cleaning this code up soon - good that I've waited. ------------- PR: https://git.openjdk.java.net/jdk/pull/2495 From chagedorn at openjdk.java.net Thu Feb 11 13:03:41 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 11 Feb 2021 13:03:41 GMT Subject: Integrated: 8240281: Remove failing assertion code when selecting first memory state in SuperWord::co_locate_pack In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 08:49:10 GMT, Christian Hagedorn wrote: > While working on [JDK-8238438](https://bugs.openjdk.java.net/browse/JDK-8238438), it was not clear if there is a case where `SuperWord::co_locate_pack()` should pick the memory state of the first load of a pack. We could not find an example and therefore added an `assert(false)` and left the code there to clean it up at some point if the assert was never hit. > > Now, a newly found fuzzer test showed that are cases where we need to pick the first memory state, triggering the `assert(false)`. In the test case `test()`, a store and a load pack are created but the store pack is filtered in `SuperWord::filter_packs()`. The load `x += iArrFld[j]` must read the old value before the store `iArrFld[j] = j` overrides it. Therefore, the load vector must be executed before any of the stores to `iArrFld[j]`. This, however, is not the case if we pick the memory state of the last load which results in wrong values for `iArrFld`: The stores and the loads are dependent and some of the stores are already executed before the load vector. > > The fix is to remove the assertion code added by JDK-8238438 and keep the code for selecting the first memory state of a load pack. > > Thanks, > Christian This pull request has now been integrated. Changeset: 0a89987a Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/0a89987a Stats: 146 lines in 2 files changed: 134 ins; 5 del; 7 mod 8240281: Remove failing assertion code when selecting first memory state in SuperWord::co_locate_pack Reviewed-by: roland, kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2495 From Pengfei.Li at arm.com Thu Feb 11 13:09:29 2021 From: Pengfei.Li at arm.com (Pengfei Li) Date: Thu, 11 Feb 2021 13:09:29 +0000 Subject: [11u] RFR(S): 8261022: Fix incorrect result of Math.abs() with char type In-Reply-To: References: Message-ID: Resend > Hi, > > I'd like to backport JDK-8261022 to jdk11u. > > Original JBS: https://bugs.openjdk.java.net/browse/JDK-8261022 > Modified webrev: http://cr.openjdk.java.net/~pli/rfr/8261022/backport11u/ > > This issue causes vectorized abs generate incorrect result when the argument > has char type. Root cause is that the vector abs operation is not specially > handled in computing vector element types after we enabled that in JDK- > 8222074 in jdk13. As JDK-8222074 was backported to jdk11u, jdk11u is also > affected. > > The patch to fix this is in jdk17 now. The fix does not apply to jdk11u cleanly, > as VectorNode::is_shift_opcode() is not defined in jdk11u. I have modified > the patch a little bit to fit this difference. > > Tested jtreg hotspot::tier1 and the newly added jtreg case. No failure after > the modified patch. > > -- > Thanks, > Pengfei From sjohanss at openjdk.java.net Thu Feb 11 13:13:37 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 11 Feb 2021 13:13:37 GMT Subject: RFR: 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes In-Reply-To: <7bUn1l-RNEYH7LE-4NzMdUU1EVOVE3brf7KQaQ9lzL4=.bc261d90-cbbb-4a79-9765-ab0a886e1e2d@github.com> References: <7bUn1l-RNEYH7LE-4NzMdUU1EVOVE3brf7KQaQ9lzL4=.bc261d90-cbbb-4a79-9765-ab0a886e1e2d@github.com> Message-ID: On Wed, 10 Feb 2021 22:57:28 GMT, Vladimir Kozlov wrote: >> When adding a code heap the page size used with the underlying mapping is traced using `os::trace_page_sizes`. The old code tried to estimate the page-size based on the size, but the mapping has already been done so it is better to check the passed in `ReservedSpace`. Today we don't record the page size in the ReservedSpace, but we have a helper to do a good estimate: `ReservedSpace::actual_reserved_page_size()`. The proposal is to use this function. >> >> When changing this I also realized that the traced min-size used un-aligned value while the actual `initialize`-call correctly uses the aligned size. Changed so that we also use the aligned size for tracing. >> >> I'm currently doing some more work in this area and while I haven't added a specific test for this issue I have created a test I plan to integrate separately when a few more needed changes have gone in. The test is Linux-only and validates the output from `os::trace_page_sizes` against the information in `/proc/self/smaps`. > > Good. Thanks for the reviews, @vnkozlov, @tstuefe and @TobiHartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2481 From sjohanss at openjdk.java.net Thu Feb 11 13:13:38 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 11 Feb 2021 13:13:38 GMT Subject: Integrated: 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes In-Reply-To: References: Message-ID: <04a_J9upMfUVtRiu6-CB0_zeWUP7HtbzoEj-w8Nkkno=.d424b527-6798-4f1b-aa6a-c1221113256a@github.com> On Tue, 9 Feb 2021 13:45:38 GMT, Stefan Johansson wrote: > When adding a code heap the page size used with the underlying mapping is traced using `os::trace_page_sizes`. The old code tried to estimate the page-size based on the size, but the mapping has already been done so it is better to check the passed in `ReservedSpace`. Today we don't record the page size in the ReservedSpace, but we have a helper to do a good estimate: `ReservedSpace::actual_reserved_page_size()`. The proposal is to use this function. > > When changing this I also realized that the traced min-size used un-aligned value while the actual `initialize`-call correctly uses the aligned size. Changed so that we also use the aligned size for tracing. > > I'm currently doing some more work in this area and while I haven't added a specific test for this issue I have created a test I plan to integrate separately when a few more needed changes have gone in. The test is Linux-only and validates the output from `os::trace_page_sizes` against the information in `/proc/self/smaps`. This pull request has now been integrated. Changeset: eef86a80 Author: Stefan Johansson URL: https://git.openjdk.java.net/jdk/commit/eef86a80 Stats: 9 lines in 1 file changed: 1 ins; 6 del; 2 mod 8261029: Code heap page sizes not traced correctly using os::trace_page_sizes Reviewed-by: kvn, stuefe, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2481 From jbhateja at openjdk.java.net Thu Feb 11 13:54:38 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 11 Feb 2021 13:54:38 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 12:56:24 GMT, Claes Redestad wrote: > > Hi Claes, > > Here is the JMH performance data over CLX for arraycopy benchmarks: > > http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_BASELINE.txt > > http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_WITH_OPTS.txt > > Regards, > > Jatin > > Thanks! Eyeballing the results it looks like a mixed bag. There even seems to be a few regressions such as this: > > ``` > o.o.b.java.lang.ArrayCopyUnalignedSrc.testLong 1200 N/A avgt 2 61.663 ns/op > --> > o.o.b.java.lang.ArrayCopyUnalignedSrc.testLong 1200 N/A avgt 2 74.160 ns/op > ``` Hi Claes, This could be a run to run variation, in general we are now having fewer number of instructions (one shift operation saved per mask computation) compared to previous masked generation sequence and thus it will always offer better execution latencies. ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From sviswanathan at openjdk.java.net Thu Feb 11 14:26:45 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 11 Feb 2021 14:26:45 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors Message-ID: The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. Before: Benchmark (size) Mode Cnt Score Error Units PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms After: Benchmark (size) Mode Cnt Score Error Units PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms ------------- Commit messages: - 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors Changes: https://git.openjdk.java.net/jdk/pull/2520/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261542 Stats: 119 lines in 7 files changed: 99 ins; 5 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/2520.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2520/head:pull/2520 PR: https://git.openjdk.java.net/jdk/pull/2520 From redestad at openjdk.java.net Thu Feb 11 14:30:37 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 11 Feb 2021 14:30:37 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 13:52:09 GMT, Jatin Bhateja wrote: > Hi Claes, This could be a run to run variation, in general we are now having fewer number of instructions (one shift operation saved per mask computation) compared to previous masked generation sequence and thus it will always offer better execution latencies. Run-to-run variation would be easy to rule out by running more forks and more iterations to attain statistically significant results. While the instruction manuals suggest latency should be better for this instruction on all CPUs where it's supported, it would be good if there was some clear proof - such as a significant benchmark win - to motivate the added complexity. ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From mdoerr at openjdk.java.net Thu Feb 11 15:25:50 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 11 Feb 2021 15:25:50 GMT Subject: RFR: 8261522: [PPC64] AES intrinsics write beyond the destination array Message-ID: I'd like to replace the read-modify-write implementation from aescrypt_encryptBlock / aescrypt_decryptBlock stubs. IT can cause severe problems (see bug description). ------------- Commit messages: - 8261522: [PPC64] AES intrinsics write beyond the destination array Changes: https://git.openjdk.java.net/jdk/pull/2514/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2514&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261522 Stats: 48 lines in 1 file changed: 10 ins; 16 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/2514.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2514/head:pull/2514 PR: https://git.openjdk.java.net/jdk/pull/2514 From roland at openjdk.java.net Thu Feb 11 15:42:56 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 11 Feb 2021 15:42:56 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker Message-ID: We spotted this issue with Shenandoah and I managed to write a simple test case that reproduces it reliably with Shenandoah but the issue is independent of the GC. The loop in the test case calls a native invoker with an oop live in rbp. rbp is saved in the native invoker stub's frame. A safepoint is triggered from the safepoint check in the native invoker. The stack walking code sees that rbp contains an oop but can't find where that oop is stored. That's because stack walking updates the caller's frame with the location of rbp in the callee on calls to frame::sender(). But the current code sets the last java frame to be the compiled frame where rbp is live. So there's no call to frame::sender() to update the location rbp. The fix I propose is that the frame of the native invoker be visible by stack walking. On a safepoint, stack walking starts from the native invoker thread, then calls frame::sender() to move to the compiled frame. That causes rbp to be properly recorded with its location in the native invoker frame. Same problem affects both x86 and aarch64. I've tested this patch with: make run-test TEST="java/foreign" TEST_VM_OPTS="-Xcomp" JTREG="TIMEOUT_FACTOR=10" on both platforms. ------------- Commit messages: - whitespaces - fix & test Changes: https://git.openjdk.java.net/jdk/pull/2528/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2528&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259937 Stats: 395 lines in 16 files changed: 264 ins; 53 del; 78 mod Patch: https://git.openjdk.java.net/jdk/pull/2528.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2528/head:pull/2528 PR: https://git.openjdk.java.net/jdk/pull/2528 From vladimir.kozlov at oracle.com Thu Feb 11 18:05:30 2021 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 11 Feb 2021 10:05:30 -0800 Subject: [External] : Re: SuperWord loop optimization lost after method inlining In-Reply-To: References: Message-ID: <87dcb670-622f-6b05-7c06-289f9b4f3634@oracle.com> Changing wide mailing list to JIT compiler only. This deoptimization is normal in Tiered Compilation - it switched from profiling code (level='3') generated by C1 compiler to new code generated by C2 (level='4') which does loop optimizations. Thank you for posting inlining information: @ 17 com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 bytes) inline (hot) \-> TypeProfile (14054/14054 counts) = com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding I thought before that may be call site is not hot but it is not the case. You can do an other experiment to collect log with disabled Tiered Compilation (only C2 is used): -XX:-TieredCompilation Also print assembler code (as you did before) for final compilation to see if loop is still not vectorized. Is it possible to post log file (on GitHub?) for me to look? Thanks, Vladimir K On 2/11/21 6:28 AM, Nicolas Heutte wrote: > Hi?Vladimir, > > Thank you for your help. > > I'm currently running Java 11.0.9, and I did not use any VM flag of note. > > I checked the content of the compilation log, and it seems that ArrayFloatToArrayFloatVectorBinding::plus() was > deoptimized in order to allow AVector::plus() to be compiled: > > > count='916' iicount='916' level='3' stamp='7394.056' comment='tiered' hot_count='896'/> > > > > > > The last compilation entry for AVector::plus() is: > > > relocation_offset='376' insts_offset='432' stub_offset='1040' scopes_data_offset='1152' scopes_pcs_offset='1592' > dependencies_offset='1880' nul_chk_table_offset='1896' oops_offset='1064' metadata_offset='1072' > method='com.qfs.vector.impl.AVector plus (Lcom/qfs/vector/IVector;)V' bytes='23' count='172425' iicount='172425' > stamp='7394.199'/> > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 1 ? com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) ? inline (hot) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14552/14552 counts) = com/qfs/vector/array/impl/ArrayFloatVector > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 7 ? com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) ? inline (hot) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14150/14150 counts) = com/qfs/vector/array/impl/ArrayFloatVector > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 10 ? com.qfs.vector.binding.impl.VectorBindings::getBinding (9 bytes) ? inline (hot) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 5 ? com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::getBinding (22 > bytes) ? inline (hot) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 3 ? com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::hasBinding > (34 bytes) ? inline (hot) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 17 ? com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 bytes) > inline (hot) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14054/14054 counts) = > com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 12 ? com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) ? inline (hot) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 22 ? com.qfs.vector.impl.AVector::checkIndex (37 bytes) ? inline (hot) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 6 ? com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) ? inline (hot) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 27 ? com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) ? accessor > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 34 ? com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) ? accessor > > > Unfortunately, I do not have access to a debug VM build, so I cannot run the second test you recommend. > > Best regards, > Nicolas Heutte > > On Wed, Feb 10, 2021 at 7:36 PM Vladimir Kozlov > wrote: > > Hi, Nicolas > > Looks like, when inlined, the loop from ArrayFloatToArrayFloatVectorBinding::plus() was not optimized at all: it is not > unrolled and has range checks. Such loops are not vectorized (you need unrolling and no checks). > > What Java version you are running? What HotSpot VM flags you are using when running application? > > Run application with -XX:+LogCompilation and look on compilation data in hotspot_pid.log file for caller > AVector::plus(). > > VM also has several flags to trace loop optimizations but they are only available in debug VM build. If you have access > to such build run with -XX:+PrintCompilation -XX:+TraceLoopOpts flags. > > Thanks, > Vladimir K > > On 2/10/21 9:24 AM, Nicolas Heutte wrote: > > Hi all, > > > > I am encountering a performance issue caused by the interaction between > > method inlining and automatic vectorization. > > > > Our application aggregates arrays intensively using a method named > > ArrayFloatToArrayFloatVectorBinding.plus() with the following code: > > > >? ? ? for (int i = 0; i < srcLen; ++i) { > > > >? ? ? ? ? ? ? dstArray[i] += srcArray[i]; > > > >? ? ? } > > > > When we microbenchmark this method we observe fast performance close to the > > practical memory bandwidth and when we print the assembly code we observe > > loop unrolling and automatic vectorization with SIMD instructions. > > > >? ? 0x000001ef4600abf0: vmovdqu 0x10(%r14,%r13,4),%ymm0 > > > >? ? 0x000001ef4600abf7: vaddps 0x10(%rcx,%r13,4),%ymm0,%ymm0 > > > >? ? 0x000001ef4600abfe: vmovdqu %ymm0,0x10(%r14,%r13,4) > > > >? ? 0x000001ef4600ac05: movslq %r13d,%r11 > > > >? ? 0x000001ef4600ac08: vmovdqu 0x30(%r14,%r11,4),%ymm0 > > > >? ? 0x000001ef4600ac0f: vaddps 0x30(%rcx,%r11,4),%ymm0,%ymm0 > > > >? ? 0x000001ef4600ac16: vmovdqu %ymm0,0x30(%r14,%r11,4) > > > >? ? 0x000001ef4600ac1d: vmovdqu 0x50(%r14,%r11,4),%ymm0 > > > >? ? 0x000001ef4600ac24: vaddps 0x50(%rcx,%r11,4),%ymm0,%ymm0 > > > >? ? 0x000001ef4600ac2b: vmovdqu %ymm0,0x50(%r14,%r11,4) > > > >? ? 0x000001ef4600ac32: vmovdqu 0x70(%r14,%r11,4),%ymm0 > > > >? ? 0x000001ef4600ac39: vaddps 0x70(%rcx,%r11,4),%ymm0,%ymm0 > > > >? ? 0x000001ef4600ac40: vmovdqu %ymm0,0x70(%r14,%r11,4) > > > >? ? 0x000001ef4600ac47: vmovdqu 0x90(%r14,%r11,4),%ymm0 > > > >? ? 0x000001ef4600ac51: vaddps 0x90(%rcx,%r11,4),%ymm0,%ymm0 > > > >? ? 0x000001ef4600ac5b: vmovdqu %ymm0,0x90(%r14,%r11,4) > > > >? ? 0x000001ef4600ac65: vmovdqu 0xb0(%r14,%r11,4),%ymm0 > > > >? ? 0x000001ef4600ac6f: vaddps 0xb0(%rcx,%r11,4),%ymm0,%ymm0 > > > >? ? 0x000001ef4600ac79: vmovdqu %ymm0,0xb0(%r14,%r11,4) > > > >? ? 0x000001ef4600ac83: vmovdqu 0xd0(%r14,%r11,4),%ymm0 > > > >? ? 0x000001ef4600ac8d: vaddps 0xd0(%rcx,%r11,4),%ymm0,%ymm0 > > > >? ? 0x000001ef4600ac97: vmovdqu %ymm0,0xd0(%r14,%r11,4) > > > >? ? 0x000001ef4600aca1: vmovdqu 0xf0(%r14,%r11,4),%ymm0 > > > >? ? 0x000001ef4600acab: vaddps 0xf0(%rcx,%r11,4),%ymm0,%ymm0 > > > >? ? 0x000001ef4600acb5: vmovdqu %ymm0,0xf0(%r14,%r11,4)? ;*fastore > > {reexecute=0 rethrow=0 return_oop=0} > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > > (line 41) > > > >? ? 0x000001ef4600acbf: add? ? $0x40,%r13d? ? ? ? ;*iinc {reexecute=0 > > rethrow=0 return_oop=0} > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > > (line 40) > > > >? ? 0x000001ef4600acc3: cmp? ? %eax,%r13d > > > >? ? 0x000001ef4600acc6: jl? ? ?0x000001ef4600abf0? ;*goto {reexecute=0 > > rethrow=0 return_oop=0} > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > > (line 40) > > > > > > > > In the real application, this method is actually inlined in a higher level > > method named AVector.plus(). Unfortunately, the inlined version of the > > aggregation code is not vectorized anymore: > > > > > > > >? ? 0x000001ef460180a0: cmp? ? %ebx,%r11d > > > >? ? 0x000001ef460180a3: jae? ? 0x000001ef460180e6 > > > >? ? 0x000001ef460180a5: vmovss 0x10(%r8,%r11,4),%xmm1? ;*faload {reexecute=0 > > rethrow=0 return_oop=0} > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 > > (line 41) > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > >? ? 0x000001ef460180ac: cmp? ? %ecx,%r11d > > > >? ? 0x000001ef460180af: jae? ? 0x000001ef46018104 > > > >? ? 0x000001ef460180b1: vaddss 0x10(%r9,%r11,4),%xmm1,%xmm1 > > > >? ? 0x000001ef460180b8: vmovss %xmm1,0x10(%r8,%r11,4)? ;*fastore {reexecute=0 > > rethrow=0 return_oop=0} > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > > (line 41) > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > >? ? 0x000001ef460180bf: inc? ? %r11d? ? ? ? ? ? ? ;*iinc {reexecute=0 > > rethrow=0 return_oop=0} > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > > (line 40) > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > >? ? 0x000001ef460180c2: cmp? ? %r10d,%r11d > > > >? ? 0x000001ef460180c5: jl? ? ?0x000001ef460180a0? ;*goto {reexecute=0 > > rethrow=0 return_oop=0} > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > > (line 40) > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > > > This causes a significant performance drop, compared to a run where we > > explicitly disable the inlining and observe automatically vectorized code > > again ( > > -XX:CompileCommand=dontinline,com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding.plus > > ). > > > > > > How do you guys explain that behavior of the JIT compiler? Is this a known > > and tracked issue, could it be fixed in the JVM? Can we do something in the > > java code to prevent this from happening? > > > > > > Best regards, > > > > Nicolas Heutte > > > From vladimir.kozlov at oracle.com Thu Feb 11 18:18:11 2021 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 11 Feb 2021 10:18:11 -0800 Subject: [11u] RFR(S): 8261022: Fix incorrect result of Math.abs() with char type In-Reply-To: References: Message-ID: <6e591fa9-27ba-4b69-8576-2540b5824471@oracle.com> Backport patch looks good. Thanks, Vladimir K On 2/11/21 5:09 AM, Pengfei Li wrote: > Resend > >> Hi, >> >> I'd like to backport JDK-8261022 to jdk11u. >> >> Original JBS: https://bugs.openjdk.java.net/browse/JDK-8261022 >> Modified webrev: http://cr.openjdk.java.net/~pli/rfr/8261022/backport11u/ >> >> This issue causes vectorized abs generate incorrect result when the argument >> has char type. Root cause is that the vector abs operation is not specially >> handled in computing vector element types after we enabled that in JDK- >> 8222074 in jdk13. As JDK-8222074 was backported to jdk11u, jdk11u is also >> affected. >> >> The patch to fix this is in jdk17 now. The fix does not apply to jdk11u cleanly, >> as VectorNode::is_shift_opcode() is not defined in jdk11u. I have modified >> the patch a little bit to fit this difference. >> >> Tested jtreg hotspot::tier1 and the newly added jtreg case. No failure after >> the modified patch. >> >> -- >> Thanks, >> Pengfei > From kvn at openjdk.java.net Thu Feb 11 18:52:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 11 Feb 2021 18:52:39 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 15:31:11 GMT, Roland Westrelin wrote: > We spotted this issue with Shenandoah and I managed to write a simple > test case that reproduces it reliably with Shenandoah but the issue is > independent of the GC. > > The loop in the test case calls a native invoker with an oop live in > rbp. rbp is saved in the native invoker stub's frame. A safepoint is > triggered from the safepoint check in the native invoker. The stack > walking code sees that rbp contains an oop but can't find where that > oop is stored. That's because stack walking updates the caller's frame > with the location of rbp in the callee on calls to > frame::sender(). But the current code sets the last java frame to be > the compiled frame where rbp is live. So there's no call to > frame::sender() to update the location rbp. The fix I propose is that > the frame of the native invoker be visible by stack walking. On a > safepoint, stack walking starts from the native invoker thread, then > calls frame::sender() to move to the compiled frame. That causes rbp > to be properly recorded with its location in the native invoker frame. > > Same problem affects both x86 and aarch64. I've tested this patch with: > > make run-test TEST="java/foreign" TEST_VM_OPTS="-Xcomp" JTREG="TIMEOUT_FACTOR=10" > > on both platforms. Seems reasonable. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2528 From iveresov at openjdk.java.net Fri Feb 12 02:59:39 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 12 Feb 2021 02:59:39 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v4] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Wed, 10 Feb 2021 23:18:26 GMT, Vladimir Kozlov wrote: > > The improvement you are proposing is not specific to uncommon traps, but can be generalized to any debug usage at safepoints. > > The downside is that, in general, rematerialization logic has to use the corresponding pure function in order to materialize the eliminated instance. In this particular case (primitive boxing), it has to take into account the caching effects of primitive box factories. Otherwise, user code can encounter identity paradoxes with rematerialized primitive box instances. > > I don't see how the scalarization logic you propose preserves identity constraints imposed by `valueOf` factories. > > Yes, it seems this optimization introduces the issue we had with Graal (8223320): > "C2 doesn't model Integer.valueOf() as anything special. It just inlines it. So the check that determines whether to allocate a new Integer or take one from the cache always happens at runtime. Graal models it as a BoxNode. It is correctly lowered, however, if it needs to be present in a JVM state, it is described as an allocation. So the decision whether to allocate or take the cached value has to happen during the deopt." > There is code in deoptimizer for JVMCI which looks for cached Boxed values. We may need to adopt it for C2 EA for this optimization to work. In addition to using the logic that we already have there for Graal (see ```Deoptimization::get_cached_box()```), you need to track where the box came from. If it comes as a result of ```valueOf()``` then it has to come from the caches, if it's something that the user allocates with ```new Integer(x)```, then it should be just a normal scalarized object. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From iveresov at openjdk.java.net Fri Feb 12 04:50:40 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 12 Feb 2021 04:50:40 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v4] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Fri, 12 Feb 2021 02:54:31 GMT, Igor Veresov wrote: >>> The improvement you are proposing is not specific to uncommon traps, but can be generalized to any debug usage at safepoints. >>> >>> The downside is that, in general, rematerialization logic has to use the corresponding pure function in order to materialize the eliminated instance. In this particular case (primitive boxing), it has to take into account the caching effects of primitive box factories. Otherwise, user code can encounter identity paradoxes with rematerialized primitive box instances. >>> >>> I don't see how the scalarization logic you propose preserves identity constraints imposed by `valueOf` factories. >> >> Yes, it seems this optimization introduces the issue we had with Graal (8223320): >> "C2 doesn't model Integer.valueOf() as anything special. It just inlines it. So the check that determines whether to allocate a new Integer or take one from the cache always happens at runtime. Graal models it as a BoxNode. It is correctly lowered, however, if it needs to be present in a JVM state, it is described as an allocation. So the decision whether to allocate or take the cached value has to happen during the deopt." >> There is code in deoptimizer for JVMCI which looks for cached Boxed values. We may need to adopt it for C2 EA for this optimization to work. > >> > The improvement you are proposing is not specific to uncommon traps, but can be generalized to any debug usage at safepoints. >> > The downside is that, in general, rematerialization logic has to use the corresponding pure function in order to materialize the eliminated instance. In this particular case (primitive boxing), it has to take into account the caching effects of primitive box factories. Otherwise, user code can encounter identity paradoxes with rematerialized primitive box instances. >> > I don't see how the scalarization logic you propose preserves identity constraints imposed by `valueOf` factories. >> >> Yes, it seems this optimization introduces the issue we had with Graal (8223320): >> "C2 doesn't model Integer.valueOf() as anything special. It just inlines it. So the check that determines whether to allocate a new Integer or take one from the cache always happens at runtime. Graal models it as a BoxNode. It is correctly lowered, however, if it needs to be present in a JVM state, it is described as an allocation. So the decision whether to allocate or take the cached value has to happen during the deopt." >> There is code in deoptimizer for JVMCI which looks for cached Boxed values. We may need to adopt it for C2 EA for this optimization to work. > > In addition to using the logic that we already have there for Graal (see ```Deoptimization::get_cached_box()```), you need to track where the box came from. If it comes as a result of ```valueOf()``` then it has to come from the caches, if it's something that the user allocates with ```new Integer(x)```, then it should be just a normal scalarized object. Also, I don't know if EA can handle a case when an Integer that is coming from an allocation and from a valueOf() are both inputs to a phi. Integer i = p ? new Integer(i1) : Integer.valueOf(i2); deopt(); int use = i.intValue(); If it does, then we'd need to force materialization. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From kvn at openjdk.java.net Fri Feb 12 05:36:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 12 Feb 2021 05:36:39 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v4] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Fri, 12 Feb 2021 04:48:06 GMT, Igor Veresov wrote: > Also, I don't know if EA can handle a case when an Integer that is coming from an allocation and from a valueOf() are both inputs to a phi. > > ``` > Integer i = p ? new Integer(i1) : Integer.valueOf(i2); > deopt(); > int use = i.intValue(); > ``` > > If it does, then we'd need to force materialization. Currently C2 EA can't scalarize merged allocations. So this is not an issue for now but may be in a future. Also C2 does not replace valueOf() with scalar node - it inlines it. As result you have branches with allocation and load from cache. Only this patch propose to use scalar node for valueOf() for the first time in C2. But it is replaced only in case it is directly referenced by debug info (no Phi or other nodes in between). So I think it is safe if we use AutoBoxObjectValue for it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From jbhateja at openjdk.java.net Fri Feb 12 06:01:37 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 12 Feb 2021 06:01:37 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 14:28:01 GMT, Claes Redestad wrote: > > Hi Claes, This could be a run to run variation, in general we are now having fewer number of instructions (one shift operation saved per mask computation) compared to previous masked generation sequence and thus it will always offer better execution latencies. > > Run-to-run variation would be easy to rule out by running more forks and more iterations to attain statistically significant results. While the instruction manuals suggest latency should be better for this instruction on all CPUs where it's supported, it would be good if there was some clear proof - such as a significant benchmark win - to motivate the added complexity. BASELINE: Result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong": 61.037 ns/op Secondary result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong:??perf": Perf stats: -------------------------------------------------- 19,739.21 msec task-clock # 0.389 CPUs utilized 646 context-switches # 0.033 K/sec 12 cpu-migrations # 0.001 K/sec 150 page-faults # 0.008 K/sec 74,59,83,59,139 cycles # 3.779 GHz (30.73%) 1,78,78,79,19,117 instructions # 2.40 insn per cycle (38.48%) 24,79,81,63,651 branches # 1256.289 M/sec (38.55%) 32,24,89,924 branch-misses # 1.30% of all branches (38.62%) 52,56,88,28,472 L1-dcache-loads # 2663.167 M/sec (38.65%) 39,00,969 L1-dcache-load-misses # 0.01% of all L1-dcache hits (38.57%) 3,74,131 LLC-loads # 0.019 M/sec (30.77%) 22,315 LLC-load-misses # 5.96% of all LL-cache hits (30.72%) L1-icache-loads 17,49,997 L1-icache-load-misses (30.72%) 52,91,41,70,636 dTLB-loads # 2680.663 M/sec (30.69%) 3,315 dTLB-load-misses # 0.00% of all dTLB cache hits (30.67%) 4,674 iTLB-loads # 0.237 K/sec (30.65%) 33,746 iTLB-load-misses # 721.99% of all iTLB cache hits (30.63%) L1-dcache-prefetches L1-dcache-prefetch-misses 50.723759146 seconds time elapsed 51.447054000 seconds user 0.189949000 seconds sys WITH OPT: Result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong": 74.356 ns/op Secondary result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong:??perf": Perf stats: -------------------------------------------------- 19,741.09 msec task-clock # 0.389 CPUs utilized 641 context-switches # 0.032 K/sec 17 cpu-migrations # 0.001 K/sec 164 page-faults # 0.008 K/sec 74,40,40,48,513 cycles # 3.769 GHz (30.81%) 1,45,66,22,06,797 instructions # 1.96 insn per cycle (38.56%) 20,31,28,43,577 branches # 1028.963 M/sec (38.65%) 14,11,419 branch-misses # 0.01% of all branches (38.69%) 43,07,86,33,662 L1-dcache-loads # 2182.182 M/sec (38.72%) 37,06,744 L1-dcache-load-misses # 0.01% of all L1-dcache hits (38.56%) 1,34,292 LLC-loads # 0.007 M/sec (30.72%) 30,627 LLC-load-misses # 22.81% of all LL-cache hits (30.68%) L1-icache-loads 14,49,145 L1-icache-load-misses (30.65%) 43,44,86,27,516 dTLB-loads # 2200.924 M/sec (30.63%) 218 dTLB-load-misses # 0.00% of all dTLB cache hits (30.63%) 2,445 iTLB-loads # 0.124 K/sec (30.63%) 28,624 iTLB-load-misses # 1170.72% of all iTLB cache hits (30.63%) L1-dcache-prefetches L1-dcache-prefetch-misses 50.716083931 seconds time elapsed 51.467300000 seconds user 0.200390000 seconds sys JMH perf data for ArrayCopyUnalignedSrc.testLong with copy length of 1200 shows degradation in LID accesses, it seems the benchmask got displaced from its sweet spot. But, there is a significant reduction in instruction count and cycles are almost comparable. We are saving one shift per mask computation. OLD Sequence: 0x00007f7fc1030ead: movabs $0x1,%rax 0x00007f7fc1030eb7: shlx %r8,%rax,%rax 0x00007f7fc1030ebc: dec %rax 0x00007f7fc1030ebf: kmovq %rax,%k2 NEW Sequence: 0x00007f775d030d51: movabs $0xffffffffffffffff,%rax 0x00007f775d030d5b: bzhi %r8,%rax,%rax 0x00007f775d030d60: kmovq %rax,%k2 ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From rcastanedalo at openjdk.java.net Fri Feb 12 09:36:02 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Feb 2021 09:36:02 GMT Subject: RFR: 8261336: IGV: enhance default filters Message-ID: Redesign the filters shown by default in the "Filters" window: - Add filters to color the graph by node category and execution frequency (if applicable), and to hide subgraphs or edges only by _category_. The category of a node can be one of {`data`, `memory`, `control`, `mixed`, `other`}, and is solely determined by its type. `mixed` nodes are those with a tuple type that has different categories, such as `CallStaticJavaNode`. The category of an edge is that of its source node. - Instrument C2 to include the category and estimated execution frequency (if available) of each node in the graph dumps produced by `-XX:PrintIdealGraphLevel=N` (only in debug builds). - Remove filters which depend on properties never emitted by C2 (e.g. 'Remove State') or which appear to be unused ('C2 Matcher Flags Coloring' and 'C2 Register Coloring'). Also remove the subsumed 'C2 Basic Coloring' filter. - Merge 'C2 Remove Filter' and 'C2 Structural' into a single filter with a clearer name ('Simplify graph'). ### Screenshots "Filters" window before (left) and after (right) the proposed change: ![filters-window](https://user-images.githubusercontent.com/8792647/107749859-7a664780-6d1b-11eb-84ba-fd43e13abd0e.png) Default color scheme before (left) and after (right) the proposed change: ![color-scheme](https://user-images.githubusercontent.com/8792647/107517355-f3479100-6bad-11eb-9b0b-a71c18961dd8.png) Examples of the new 'Color by execution frequency' filter: ![color-by-frequency-2](https://user-images.githubusercontent.com/8792647/107518492-5980e380-6baf-11eb-9e01-992b211d06e3.png) ![color-by-frequency-1](https://user-images.githubusercontent.com/8792647/107518477-54bc2f80-6baf-11eb-8c7b-7eb7c1d85cf7.png) Example of the new 'Hide X subgraph'' filters, where only the data subgraph is shown: ![hide-all-but-data-subgraphs](https://user-images.githubusercontent.com/8792647/107750398-42133900-6d1c-11eb-8b27-264086b32bea.png) Example of the new 'Hide X edges' filters, where all nodes remain in their position but only the memory edges are shown: ![hide-all-but-memory-edges](https://user-images.githubusercontent.com/8792647/107751137-49871200-6d1d-11eb-9df0-79747bf8d16e.png) Tested IGV manually on a few graphs. Tested C2 instrumentation by running `hs-tier1` with `-Xbatch -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=graph.xml` on windows-x64, linux-x64, linux-aarch64, and macosx-x64 (all debug). ------------- Commit messages: - Use 'subgraph' instead of 'nodes' in filter names, for clarity - Add filters to hide edges by category - Update year in headers - Adjust frequency precision - Improve color scale in 'Color by execution frequency' filter - Add filter to color nodes by estimated execution frequency - Lift simplification filter - Merge simplification filters into a single, more intuitive one - Add filters to hide nodes by category - Reorder filters to preserve the behavior of 'Show control flow only' - ... and 4 more: https://git.openjdk.java.net/jdk/compare/db0ca2b9...3ed1e3de Changes: https://git.openjdk.java.net/jdk/pull/2499/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2499&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261336 Stats: 359 lines in 25 files changed: 284 ins; 40 del; 35 mod Patch: https://git.openjdk.java.net/jdk/pull/2499.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2499/head:pull/2499 PR: https://git.openjdk.java.net/jdk/pull/2499 From roland at openjdk.java.net Fri Feb 12 09:37:02 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 12 Feb 2021 09:37:02 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker [v2] In-Reply-To: References: Message-ID: > We spotted this issue with Shenandoah and I managed to write a simple > test case that reproduces it reliably with Shenandoah but the issue is > independent of the GC. > > The loop in the test case calls a native invoker with an oop live in > rbp. rbp is saved in the native invoker stub's frame. A safepoint is > triggered from the safepoint check in the native invoker. The stack > walking code sees that rbp contains an oop but can't find where that > oop is stored. That's because stack walking updates the caller's frame > with the location of rbp in the callee on calls to > frame::sender(). But the current code sets the last java frame to be > the compiled frame where rbp is live. So there's no call to > frame::sender() to update the location rbp. The fix I propose is that > the frame of the native invoker be visible by stack walking. On a > safepoint, stack walking starts from the native invoker thread, then > calls frame::sender() to move to the compiled frame. That causes rbp > to be properly recorded with its location in the native invoker frame. > > Same problem affects both x86 and aarch64. I've tested this patch with: > > make run-test TEST="java/foreign" TEST_VM_OPTS="-Xcomp" JTREG="TIMEOUT_FACTOR=10" > > on both platforms. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: broken build ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2528/files - new: https://git.openjdk.java.net/jdk/pull/2528/files/88eb13d0..5b9dfff7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2528&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2528&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2528.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2528/head:pull/2528 PR: https://git.openjdk.java.net/jdk/pull/2528 From chagedorn at openjdk.java.net Fri Feb 12 10:08:56 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 12 Feb 2021 10:08:56 GMT Subject: RFR: 8261235: C1 compilation fails with assert(res->vreg_number() == index) failed: conversion check Message-ID: <-uBsLA86fiUezxtSxF_GW9oT62EezgztzXPwSta8EAw=.cb63031d-3ab4-468c-9351-1f1e10c377a9@github.com> The assertion is hit because we run out of virtual registers in the linear scan in C1 and do not handle it. I fixed it by applying the same bailout as in `LIRGenerator::new_register()`. There is also a second issue that `LIR_OprDesc::vreg_max` is too big. It is only used in this bailout code. `OprBits::vreg_max` is defined over `OprBits::data_bits` which uses `OprBits::non_data_bits`. But `OprBits::non_data_bits` does not consider `OprBits::pointer_bits` which results in a too large value for `LIR_OprDesc::vreg_max` and the assertion is hit because we don't bail out, yet. This needs to be fixed as well. Thanks, Christian ------------- Commit messages: - 8261235: C1 compilation fails with assert(res->vreg_number() == index) failed: conversion check Changes: https://git.openjdk.java.net/jdk/pull/2543/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2543&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261235 Stats: 4151 lines in 6 files changed: 4137 ins; 2 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/2543.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2543/head:pull/2543 PR: https://git.openjdk.java.net/jdk/pull/2543 From vlivanov at openjdk.java.net Fri Feb 12 10:39:40 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 12 Feb 2021 10:39:40 GMT Subject: RFR: 8261336: IGV: enhance default filters In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 10:00:00 GMT, Roberto Casta?eda Lozano wrote: > Redesign the filters shown by default in the "Filters" window: > > - Add filters to color the graph by node category and execution frequency (if applicable), and to hide subgraphs or edges only by _category_. The category of a node can be one of {`data`, `memory`, `control`, `mixed`, `other`}, and is solely determined by its type. `mixed` nodes are those with a tuple type that has different categories, such as `CallStaticJavaNode`. The category of an edge is that of its source node. > > - Instrument C2 to include the category and estimated execution frequency (if available) of each node in the graph dumps produced by `-XX:PrintIdealGraphLevel=N` (only in debug builds). > > - Remove filters which depend on properties never emitted by C2 (e.g. 'Remove State') or which appear to be unused ('C2 Matcher Flags Coloring' and 'C2 Register Coloring'). Also remove the subsumed 'C2 Basic Coloring' filter. > > - Merge 'C2 Remove Filter' and 'C2 Structural' into a single filter with a clearer name ('Simplify graph'). > > ### Screenshots > > "Filters" window before (left) and after (right) the proposed change: > ![filters-window](https://user-images.githubusercontent.com/8792647/107749859-7a664780-6d1b-11eb-84ba-fd43e13abd0e.png) > Default color scheme before (left) and after (right) the proposed change: > ![color-scheme](https://user-images.githubusercontent.com/8792647/107517355-f3479100-6bad-11eb-9b0b-a71c18961dd8.png) > Examples of the new 'Color by execution frequency' filter: > ![color-by-frequency-2](https://user-images.githubusercontent.com/8792647/107518492-5980e380-6baf-11eb-9e01-992b211d06e3.png) > ![color-by-frequency-1](https://user-images.githubusercontent.com/8792647/107518477-54bc2f80-6baf-11eb-8c7b-7eb7c1d85cf7.png) > Example of the new 'Hide X subgraph'' filters, where only the data subgraph is shown: > ![hide-all-but-data-subgraphs](https://user-images.githubusercontent.com/8792647/107750398-42133900-6d1c-11eb-8b27-264086b32bea.png) > Example of the new 'Hide X edges' filters, where all nodes remain in their position but only the memory edges are shown: > ![hide-all-but-memory-edges](https://user-images.githubusercontent.com/8792647/107751137-49871200-6d1d-11eb-9df0-79747bf8d16e.png) > > > Tested IGV manually on a few graphs. Tested C2 instrumentation by running `hs-tier1` with `-Xbatch -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=graph.xml` on windows-x64, linux-x64, linux-aarch64, and macosx-x64 (all debug). Very nice! ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2499 From rcastanedalo at openjdk.java.net Fri Feb 12 10:39:41 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Feb 2021 10:39:41 GMT Subject: RFR: 8261336: IGV: enhance default filters In-Reply-To: References: Message-ID: <0GoV80nCM_IyK5msCNKbEGVt6czX5ClV8-m1vd-tVNI=.dbc0eeb2-ada6-4ff9-b25a-6e5a87b58393@github.com> On Fri, 12 Feb 2021 10:34:59 GMT, Vladimir Ivanov wrote: > Very nice! Thanks Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/2499 From adinn at openjdk.java.net Fri Feb 12 10:43:43 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Fri, 12 Feb 2021 10:43:43 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 18:49:48 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> broken build > > Seems reasonable. The code here looks ok. I'm slightly concerned about the consequences of adding a new stack frame visible to stack walking code. Does this have the potential to break serviceability code that reports and/or analyzes stack frames (whether that's code in OpenJDK or 3rd party code)? ------------- PR: https://git.openjdk.java.net/jdk/pull/2528 From roland at openjdk.java.net Fri Feb 12 12:22:39 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 12 Feb 2021 12:22:39 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker [v2] In-Reply-To: References: Message-ID: <2Ud0NpZTLE6UXcQ3S_kz6nK3N5-XAjYnwczmmpgB-Io=.09af4289-8ce5-4886-9cd5-7f5888f3772f@github.com> On Fri, 12 Feb 2021 10:41:18 GMT, Andrew Dinn wrote: > The code here looks ok. I'm slightly concerned about the consequences of adding a new stack frame visible to stack walking code. Does this have the potential to break serviceability code that reports and/or analyzes stack frames (whether that's code in OpenJDK or 3rd party code)? Thanks for the review. The native invoker code is new in jdk 16 so it's unlikely tools already rely of some specific layout. @iwanowww are you the author of the native invoker code? What do you think? ------------- PR: https://git.openjdk.java.net/jdk/pull/2528 From chagedorn at openjdk.java.net Fri Feb 12 12:51:43 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 12 Feb 2021 12:51:43 GMT Subject: RFR: 8261336: IGV: enhance default filters In-Reply-To: References: Message-ID: <0Fwnnxu5gJTYdq6zVX8pzH5U4H-daq9fFnSGX3hgV0E=.1d812d3d-cdcd-4c78-9130-bf2d7b795580@github.com> On Wed, 10 Feb 2021 10:00:00 GMT, Roberto Casta?eda Lozano wrote: > Redesign the filters shown by default in the "Filters" window: > > - Add filters to color the graph by node category and execution frequency (if applicable), and to hide subgraphs or edges only by _category_. The category of a node can be one of {`data`, `memory`, `control`, `mixed`, `other`}, and is solely determined by its type. `mixed` nodes are those with a tuple type that has different categories, such as `CallStaticJavaNode`. The category of an edge is that of its source node. > > - Instrument C2 to include the category and estimated execution frequency (if available) of each node in the graph dumps produced by `-XX:PrintIdealGraphLevel=N` (only in debug builds). > > - Remove filters which depend on properties never emitted by C2 (e.g. 'Remove State') or which appear to be unused ('C2 Matcher Flags Coloring' and 'C2 Register Coloring'). Also remove the subsumed 'C2 Basic Coloring' filter. > > - Merge 'C2 Remove Filter' and 'C2 Structural' into a single filter with a clearer name ('Simplify graph'). > > ### Screenshots > > "Filters" window before (left) and after (right) the proposed change: > ![filters-window](https://user-images.githubusercontent.com/8792647/107749859-7a664780-6d1b-11eb-84ba-fd43e13abd0e.png) > Default color scheme before (left) and after (right) the proposed change: > ![color-scheme](https://user-images.githubusercontent.com/8792647/107517355-f3479100-6bad-11eb-9b0b-a71c18961dd8.png) > Examples of the new 'Color by execution frequency' filter: > ![color-by-frequency-2](https://user-images.githubusercontent.com/8792647/107518492-5980e380-6baf-11eb-9e01-992b211d06e3.png) > ![color-by-frequency-1](https://user-images.githubusercontent.com/8792647/107518477-54bc2f80-6baf-11eb-8c7b-7eb7c1d85cf7.png) > Example of the new 'Hide X subgraph'' filters, where only the data subgraph is shown: > ![hide-all-but-data-subgraphs](https://user-images.githubusercontent.com/8792647/107750398-42133900-6d1c-11eb-8b27-264086b32bea.png) > Example of the new 'Hide X edges' filters, where all nodes remain in their position but only the memory edges are shown: > ![hide-all-but-memory-edges](https://user-images.githubusercontent.com/8792647/107751137-49871200-6d1d-11eb-9df0-79747bf8d16e.png) > > > Tested IGV manually on a few graphs. Tested C2 instrumentation by running `hs-tier1` with `-Xbatch -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=graph.xml` on windows-x64, linux-x64, linux-aarch64, and macosx-x64 (all debug). Otherwise, very nice improvement! I applied the patch and played around with it in the IGV. Everything seems to work as expected. I like the new default filters and the new colors for the new categories. That makes it much more useful. src/utils/IdealGraphVisualizer/ServerCompiler/src/com/sun/hotspot/igv/servercompiler/filters/onlyControlFlow.filter line 23: > 21: ) > 22: ); > 23: f.addRule(new RemoveFilter.RemoveRule(new MatcherSelector(new Properties.RegexpPropertyMatcher("name", "Phi|CreateEx|Cast.*|Load.|Store.")))); I tried the filter out on a graph and noticed that there are still some data and memory nodes shown. How about utilizing the newly added categories? I played around a bit and this seems to work quite nicely. What do you think? var f = new RemoveFilter("Show only control flow"); f.addRule( new RemoveFilter.RemoveRule( new OrSelector( new InvertSelector( new MatcherSelector( new Properties.RegexpPropertyMatcher("category", "control|mixed|other") ) ), new MatcherSelector( new Properties.StringPropertyMatcher("type", "abIO") ) ), false ) ); f.addRule(new RemoveFilter.RemoveRule(new MatcherSelector(new Properties.RegexpPropertyMatcher("name", "Phi|Root|Con")))); f.apply(graph); src/hotspot/share/opto/idealGraphPrinter.hpp line 95: > 93: bool _traverse_outs; > 94: Compile *C; > 95: double max_freq; Fields should have a leading underscore. src/hotspot/share/opto/type.cpp line 1120: > 1118: Type::CATEGORY Type::category() const { > 1119: const TypeTuple* tuple; > 1120: switch (base()) { Might be more readable if switch cases are indented. src/hotspot/share/opto/type.hpp line 374: > 372: CatControl, > 373: CatOther, // {Type::Top, Type::Abio, Type::Bottom}. > 374: CatUndef // {Type::Bad, Type::lastype}, for completeness. Is "Cat" required when the enum is already called "CATEGORY"? src/hotspot/share/opto/idealGraphPrinter.cpp line 394: > 392: } > 393: > 394: switch (t->category()) { Might be more readable if switch cases are indented. src/hotspot/share/opto/type.cpp line 1178: > 1176: } > 1177: assert(false, "unmatched base type"); > 1178: return CatUndef; You could add that to an explicit default case to the above switch statement to make it more clear. ------------- Changes requested by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2499 From rcastanedalo at openjdk.java.net Fri Feb 12 15:37:41 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Feb 2021 15:37:41 GMT Subject: RFR: 8261336: IGV: enhance default filters In-Reply-To: <0Fwnnxu5gJTYdq6zVX8pzH5U4H-daq9fFnSGX3hgV0E=.1d812d3d-cdcd-4c78-9130-bf2d7b795580@github.com> References: <0Fwnnxu5gJTYdq6zVX8pzH5U4H-daq9fFnSGX3hgV0E=.1d812d3d-cdcd-4c78-9130-bf2d7b795580@github.com> Message-ID: On Fri, 12 Feb 2021 12:48:50 GMT, Christian Hagedorn wrote: > Otherwise, very nice improvement! I applied the patch and played around with it in the IGV. Everything seems to work as expected. I like the new default filters and the new colors for the new categories. That makes it much more useful. Thanks for reviewing Christian! I will have a look at your comments on Monday. ------------- PR: https://git.openjdk.java.net/jdk/pull/2499 From dcubed at openjdk.java.net Fri Feb 12 16:08:44 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 16:08:44 GMT Subject: RFR: 8261659: JDK-8261027 causes a Tier1 validate-source failure Message-ID: A trivial fix to make Tier1 'validate-source' happy again. ------------- Commit messages: - 8261659: JDK-8261027 causes a Tier1 validate-source failure Changes: https://git.openjdk.java.net/jdk/pull/2551/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2551&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261659 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2551.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2551/head:pull/2551 PR: https://git.openjdk.java.net/jdk/pull/2551 From dcubed at openjdk.java.net Fri Feb 12 16:08:44 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 16:08:44 GMT Subject: RFR: 8261659: JDK-8261027 causes a Tier1 validate-source failure In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 16:03:57 GMT, Daniel D. Daugherty wrote: > A trivial fix to make Tier1 'validate-source' happy again. @theRealAph - Just a heads up that I had to tweak one of your files (copyright header only). ------------- PR: https://git.openjdk.java.net/jdk/pull/2551 From iignatyev at openjdk.java.net Fri Feb 12 16:13:37 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 12 Feb 2021 16:13:37 GMT Subject: RFR: 8261659: JDK-8261027 causes a Tier1 validate-source failure In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 16:03:57 GMT, Daniel D. Daugherty wrote: > A trivial fix to make Tier1 'validate-source' happy again. ?? ------------- Marked as reviewed by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2551 From bpb at openjdk.java.net Fri Feb 12 16:13:39 2021 From: bpb at openjdk.java.net (Brian Burkhalter) Date: Fri, 12 Feb 2021 16:13:39 GMT Subject: RFR: 8261659: JDK-8261027 causes a Tier1 validate-source failure In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 16:03:57 GMT, Daniel D. Daugherty wrote: > A trivial fix to make Tier1 'validate-source' happy again. Marked as reviewed by bpb (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2551 From dcubed at openjdk.java.net Fri Feb 12 16:20:41 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 16:20:41 GMT Subject: Integrated: 8261659: JDK-8261027 causes a Tier1 validate-source failure In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 16:03:57 GMT, Daniel D. Daugherty wrote: > A trivial fix to make Tier1 'validate-source' happy again. This pull request has now been integrated. Changeset: 33fcd325 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/33fcd325 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod 8261659: JDK-8261027 causes a Tier1 validate-source failure Reviewed-by: iignatyev, bpb ------------- PR: https://git.openjdk.java.net/jdk/pull/2551 From dcubed at openjdk.java.net Fri Feb 12 16:20:40 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 16:20:40 GMT Subject: RFR: 8261659: JDK-8261027 causes a Tier1 validate-source failure In-Reply-To: References: Message-ID: <80idU4OfeAmNoSjd4k3g7PKCEav3mI5DG7ddt45KLnQ=.6a2650bd-b7e1-4d45-bc4f-cd4773685f8f@github.com> On Fri, 12 Feb 2021 16:10:20 GMT, Igor Ignatyev wrote: >> A trivial fix to make Tier1 'validate-source' happy again. > > ?? @iignatev and @bplb - Thanks for the fast reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/2551 From mdoerr at openjdk.java.net Fri Feb 12 16:33:52 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 12 Feb 2021 16:33:52 GMT Subject: RFR: 8261657: [PPC64] Cleanup StoreCM nodes after CMS removal Message-ID: We only need one StoreCM node after CMS removal. CMS StoreStore barriers were already removed at other places. ------------- Commit messages: - 8261657: [PPC64] Cleanup StoreCM nodes after CMS removal Changes: https://git.openjdk.java.net/jdk/pull/2554/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2554&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261657 Stats: 55 lines in 1 file changed: 0 ins; 52 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2554.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2554/head:pull/2554 PR: https://git.openjdk.java.net/jdk/pull/2554 From github.com+300291+eirbjo at openjdk.java.net Fri Feb 12 17:17:39 2021 From: github.com+300291+eirbjo at openjdk.java.net (Eirik Bjorsnos) Date: Fri, 12 Feb 2021 17:17:39 GMT Subject: RFR: 8261235: C1 compilation fails with assert(res->vreg_number() == index) failed: conversion check In-Reply-To: <-uBsLA86fiUezxtSxF_GW9oT62EezgztzXPwSta8EAw=.cb63031d-3ab4-468c-9351-1f1e10c377a9@github.com> References: <-uBsLA86fiUezxtSxF_GW9oT62EezgztzXPwSta8EAw=.cb63031d-3ab4-468c-9351-1f1e10c377a9@github.com> Message-ID: On Fri, 12 Feb 2021 10:03:25 GMT, Christian Hagedorn wrote: > The assertion is hit because we run out of virtual registers in the linear scan in C1 and do not handle it. I fixed it by applying the same bailout as in `LIRGenerator::new_register()`. > > There is also a second issue that `LIR_OprDesc::vreg_max` is too big. It is only used in this bailout code. `OprBits::vreg_max` is defined over `OprBits::data_bits` which uses `OprBits::non_data_bits`. But `OprBits::non_data_bits` does not consider `OprBits::pointer_bits` which results in a too large value for `LIR_OprDesc::vreg_max` and the assertion is hit because we don't bail out, yet. This needs to be fixed as well. > > Thanks, > Christian I tested this branch on my reproducer and it works like a charm: 776 302 3 org.jaxen.saxpath.base.Verifier::isXMLLetter (6130 bytes) [...] compilation bailout: out of virtual registers in linear scan 4717 302 3 org.jaxen.saxpath.base.Verifier::isXMLLetter (6130 bytes) COMPILE SKIPPED: out of virtual registers in linear scan (retry at different tier) 4718 334 4 org.jaxen.saxpath.base.Verifier::isXMLLetter (6130 bytes) Is it expected that the bailout is logged twice? First, a log line says: "compilation bailout: out of virtual registers in linear scan", then at the end of the next line: "COMPILE SKIPPED: out of virtual registers in linear scan (retry at different tier)" ? ------------- PR: https://git.openjdk.java.net/jdk/pull/2543 From xliu at openjdk.java.net Sat Feb 13 01:56:53 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 13 Feb 2021 01:56:53 GMT Subject: RFR: 8261675: ObjectValue::set_visited(bool) sets _visited false Message-ID: <3qsdFKCRBP1rIh_Q_9JUQzd-LP43WZchNrFxajKp1lY=.eb22d8dd-dd57-4108-8e64-a7bedace6c99@github.com> The setter is error-prone. it unconditionally sets _visited false. this patch stores the argument to it. ------------- Commit messages: - 8261675: ObjectValue::set_visited(bool) sets _visited false Changes: https://git.openjdk.java.net/jdk/pull/2560/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2560&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261675 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2560.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2560/head:pull/2560 PR: https://git.openjdk.java.net/jdk/pull/2560 From github.com+670087+jrziviani at openjdk.java.net Sat Feb 13 02:22:38 2021 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Sat, 13 Feb 2021 02:22:38 GMT Subject: RFR: 8261522: [PPC64] AES intrinsics write beyond the destination array In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 17:05:44 GMT, Martin Doerr wrote: > I'd like to replace the read-modify-write implementation from aescrypt_encryptBlock / aescrypt_decryptBlock stubs. IT can cause severe problems (see bug description). Very good. +1 Tested on P8, P9, and P10. With this patch I don't reproduce the problem. (tested LE only) ------------- Marked as reviewed by jrziviani at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/2514 From kvn at openjdk.java.net Sat Feb 13 06:36:38 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 13 Feb 2021 06:36:38 GMT Subject: RFR: 8261675: ObjectValue::set_visited(bool) sets _visited false In-Reply-To: <3qsdFKCRBP1rIh_Q_9JUQzd-LP43WZchNrFxajKp1lY=.eb22d8dd-dd57-4108-8e64-a7bedace6c99@github.com> References: <3qsdFKCRBP1rIh_Q_9JUQzd-LP43WZchNrFxajKp1lY=.eb22d8dd-dd57-4108-8e64-a7bedace6c99@github.com> Message-ID: On Sat, 13 Feb 2021 01:51:39 GMT, Xin Liu wrote: > The setter is error-prone. it unconditionally sets _visited false. > this patch stores the argument to it. Wow. This was original changes for first implementation of Escape Analysis 6558600. But it works because the only place where `set_visited()` is called it sets to `false`: [debugInfoRec.cpp#L358](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/debugInfoRec.cpp#L358) `is_visited()` is not called at all - `_visited` field is accessed directly only in one place: [debugInfo.cpp#L161](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/debugInfo.cpp#L161) Would be nice to clean up this mess. ------------- PR: https://git.openjdk.java.net/jdk/pull/2560 From kvn at openjdk.java.net Sat Feb 13 08:12:37 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 13 Feb 2021 08:12:37 GMT Subject: RFR: 8261675: ObjectValue::set_visited(bool) sets _visited false In-Reply-To: References: <3qsdFKCRBP1rIh_Q_9JUQzd-LP43WZchNrFxajKp1lY=.eb22d8dd-dd57-4108-8e64-a7bedace6c99@github.com> Message-ID: On Sat, 13 Feb 2021 06:33:36 GMT, Vladimir Kozlov wrote: >> The setter is error-prone. it unconditionally sets _visited false. >> this patch stores the argument to it. > > Wow. This was original changes for first implementation of Escape Analysis 6558600. > > But it works because the only place where `set_visited()` is called it sets to `false`: > [debugInfoRec.cpp#L358](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/debugInfoRec.cpp#L358) > > `is_visited()` is not called at all - `_visited` field is accessed directly only in one place: > [debugInfo.cpp#L161](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/debugInfo.cpp#L161) > > Would be nice to clean up this mess. In addition to your fix you can consider next changes to avoid dumping unneeded data in debug info (deoptimizer reads objects data only from top frame [deoptimization.cpp#L190](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/deoptimization.cpp#L190): src/hotspot/share/opto/output.cpp // We dump the object pool first, since deoptimization reads it in first. - C->debug_info()->dump_object_pool(objs); + C->debug_info()->dump_object_pool(objs, (depth < max_depth)); src/hotspot/share/code/debugInfoRec.hpp - void dump_object_pool(GrowableArray* objects); + void dump_object_pool(GrowableArray* objects, bool visited = false); src/hotspot/share/code/debugInfoRec.cpp -void DebugInformationRecorder::dump_object_pool(GrowableArray* objects) { +void DebugInformationRecorder::dump_object_pool(GrowableArray* objects, bool visited) { guarantee( _pcs_length > 0, "safepoint must exist before describing scopes"); PcDesc* last_pd = &_pcs[_pcs_length-1]; if (objects != NULL) { for (int i = objects->length() - 1; i >= 0; i--) { - objects->at(i)->as_ObjectValue()->set_visited(false); + objects->at(i)->as_ObjectValue()->set_visited(visited); } } ------------- PR: https://git.openjdk.java.net/jdk/pull/2560 From xliu at openjdk.java.net Sat Feb 13 21:24:38 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 13 Feb 2021 21:24:38 GMT Subject: RFR: 8261675: ObjectValue::set_visited(bool) sets _visited false In-Reply-To: References: <3qsdFKCRBP1rIh_Q_9JUQzd-LP43WZchNrFxajKp1lY=.eb22d8dd-dd57-4108-8e64-a7bedace6c99@github.com> Message-ID: On Sat, 13 Feb 2021 08:10:22 GMT, Vladimir Kozlov wrote: > In addition to your fix you can consider next changes to avoid dumping unneeded data in debug info (deoptimizer reads objects data only from top frame [deoptimization.cpp#L190](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/deoptimization.cpp#L190): > > ``` > src/hotspot/share/opto/output.cpp > // We dump the object pool first, since deoptimization reads it in first. > - C->debug_info()->dump_object_pool(objs); > + C->debug_info()->dump_object_pool(objs, (depth < max_depth)); > ``` hi, Vladimir, Thank you for reviewing this patch. I don't understand why it's depth < max_depth instead of <=. Further, the effect of this optimization seems limited. this statement is in a loop like this. depth < max_depth is almost always true. for (int depth = 1; depth <= max_depth; depth++) { ... C->debug_info()->dump_object_pool(objs, (depth < max_depth)); ... } but I get your point. It seems that `Process_OopMap_Node` may dump identical objects in the loop. I am not sure there are objects overlap different jvmstates. let me check and create another issue if so. ------------- PR: https://git.openjdk.java.net/jdk/pull/2560 From xliu at openjdk.java.net Sat Feb 13 21:46:56 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 13 Feb 2021 21:46:56 GMT Subject: RFR: 8261675: ObjectValue::set_visited(bool) sets _visited false [v2] In-Reply-To: <3qsdFKCRBP1rIh_Q_9JUQzd-LP43WZchNrFxajKp1lY=.eb22d8dd-dd57-4108-8e64-a7bedace6c99@github.com> References: <3qsdFKCRBP1rIh_Q_9JUQzd-LP43WZchNrFxajKp1lY=.eb22d8dd-dd57-4108-8e64-a7bedace6c99@github.com> Message-ID: > The setter is error-prone. it unconditionally sets _visited false. > this patch stores the argument to it. Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8261675: ObjectValue::set_visited(bool) sets _visited false use getter and setter of _visited. update the year of copyright. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2560/files - new: https://git.openjdk.java.net/jdk/pull/2560/files/2d512adc..dce5bbf2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2560&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2560&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2560.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2560/head:pull/2560 PR: https://git.openjdk.java.net/jdk/pull/2560 From xliu at openjdk.java.net Sun Feb 14 05:44:51 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sun, 14 Feb 2021 05:44:51 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v4] In-Reply-To: References: Message-ID: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set reimplement this feature. withdraw my intrusive change in outputStream. use stringStream only for the constant OopPtr. after oop->print_on(st), delete all appearances of '\n' - Merge branch 'master' into JDK-8260198 - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set fix merge conflict. - Merge branch 'master' into JDK-8260198 - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. Correct TypeInstPtr::dump2 to make sure it only emits klass name once. Remove the comment because Klass::oop_print_on() has emitted the address of oop. Before: 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a" :Constant:exact * After: 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * ------------- Changes: https://git.openjdk.java.net/jdk/pull/2178/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=03 Stats: 51 lines in 4 files changed: 45 ins; 1 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2178.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2178/head:pull/2178 PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Sun Feb 14 05:53:40 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sun, 14 Feb 2021 05:53:40 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set In-Reply-To: References: <1lfc5MjMIFUx-Q19CkfKJvP4yYoM6B5APuRsvevpuk8=.3ba9c333-74b6-4ba4-bb3c-66c4583701fb@github.com> Message-ID: On Thu, 28 Jan 2021 18:03:43 GMT, Xin Liu wrote: >> The result your are trying to achieve is good, but I'm not sure pushing supress_cr into outputstream is the right thing. I would like to just not emit the cr's instead - but do also I see that isn't simple, because adding an extra bool to print_on would cascade into the entire codebase. > > @neliasso Thanks for reviewing this. > Exactly. The first reason is I am not familiar with oops/ codebase. I guess some clients expect to see multiple lines. The second reason is that there are [many places](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/oops/klass.cpp#L783). I am not sure I can clean them up completely. > > That's why I modify outputStream and give it a 'suppress_cr' mode. May I ask hotspot-dev's advice? /cc hotspot-dev I reimplement this feature using streamStream. After change, a ConP of an Constant OopPtr becomes a one-liner. eg. 279 ConP === 0 [[ 1105 ]] Oop:java/lang/String java.lang.String {0x000000010100e3d0} - klass: public final synchronized 'java/lang/String' - string: "":Constant:exact * please note that I keep "Oop:java/lang/String". It's the output klass()->print_name_on(st); The remaining part "java.lang.String {0x000000010100e3d0} - klass: public final synchronized 'java/lang/String' - string: "":Constant:exact *" is the output of oop->print_on(st). because it's output with `:+Verbose`, I think it's okay to have a bit verbose. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Mon Feb 15 09:26:56 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 15 Feb 2021 09:26:56 GMT Subject: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object Message-ID: There are 3 nodes involving in the construction of a java.lang.String object. 1. Allocate of itself, aka. alloc 2. AllocateArray of a byte array, which is value:byte[], aka. aa 3. ArrayCopyNode which copys in the contents of value, aka. ac Lemma When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`. Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well. It is possible to rewire `aa` to the source of ac with the correct offset. That is to say, we can replace both `aa` and `ac` with a ?shallow copy? of the source of `ac`. It?s safe if C2 keeps a reference of the source oop for all safepoints. ------------- Commit messages: - fix regression for x86-32 - add a statistical counter for OptimizeTempArray. - [SIM-JVM-450] support deoptimization v2 - add a unit test for deoptimization - [SIM-JVM-450] support deoptimization part2 - enable OptimizeTempArray by default - Merge branch 'master' into optimize_substring - Revert "8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set" - Revert "add a new bucket afterea_late_inlines" - [SIM-JVM-450] support deoptimization - ... and 25 more: https://git.openjdk.java.net/jdk/compare/4619f372...fd9ca4b8 Changes: https://git.openjdk.java.net/jdk/pull/2570/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2570&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261731 Stats: 861 lines in 16 files changed: 844 ins; 2 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/2570.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2570/head:pull/2570 PR: https://git.openjdk.java.net/jdk/pull/2570 From chagedorn at openjdk.java.net Mon Feb 15 09:57:40 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 15 Feb 2021 09:57:40 GMT Subject: RFR: 8261235: C1 compilation fails with assert(res->vreg_number() == index) failed: conversion check In-Reply-To: References: <-uBsLA86fiUezxtSxF_GW9oT62EezgztzXPwSta8EAw=.cb63031d-3ab4-468c-9351-1f1e10c377a9@github.com> Message-ID: <20tIOLHtwmZjPQSDLZgq_6hjIxaz1Z-6c6JX13ecUYc=.f2677c8a-a6f9-4f77-8ba2-a7e3bc05ff91@github.com> On Fri, 12 Feb 2021 17:15:01 GMT, Eirik Bjorsnos wrote: >> The assertion is hit because we run out of virtual registers in the linear scan in C1 and do not handle it. I fixed it by applying the same bailout as in `LIRGenerator::new_register()`. >> >> There is also a second issue that `LIR_OprDesc::vreg_max` is too big. It is only used in this bailout code. `OprBits::vreg_max` is defined over `OprBits::data_bits` which uses `OprBits::non_data_bits`. But `OprBits::non_data_bits` does not consider `OprBits::pointer_bits` which results in a too large value for `LIR_OprDesc::vreg_max` and the assertion is hit because we don't bail out, yet. This needs to be fixed as well. >> >> Thanks, >> Christian > > I tested this branch on my reproducer and it works like a charm: > > 776 302 3 org.jaxen.saxpath.base.Verifier::isXMLLetter (6130 bytes) > [...] > compilation bailout: out of virtual registers in linear scan > 4717 302 3 org.jaxen.saxpath.base.Verifier::isXMLLetter (6130 bytes) COMPILE SKIPPED: out of virtual registers in linear scan (retry at different tier) > 4718 334 4 org.jaxen.saxpath.base.Verifier::isXMLLetter (6130 bytes) > > Is it expected that the bailout is logged twice? > > First, a log line says: "compilation bailout: out of virtual registers in linear scan", then at the end of the next line: "COMPILE SKIPPED: out of virtual registers in linear scan (retry at different tier)" ? Thanks for verifying it! > Is it expected that the bailout is logged twice? > > First, a log line says: "compilation bailout: out of virtual registers in linear scan", then at the end of the next line: "COMPILE SKIPPED: out of virtual registers in linear scan (retry at different tier)" ? Yes, that is expected. We first log the bailout alone here: https://github.com/openjdk/jdk/blob/3882fda83bebf4a8c8100fd59c37f04610491ce7/src/hotspot/share/c1/c1_Compilation.cpp#L620-L627 And then later, we log the entire failed task with the failure reason (in this case the bailout) here: https://github.com/openjdk/jdk/blob/3882fda83bebf4a8c8100fd59c37f04610491ce7/src/hotspot/share/compiler/compileBroker.cpp#L2347-L2352 ------------- PR: https://git.openjdk.java.net/jdk/pull/2543 From rcastanedalo at openjdk.java.net Mon Feb 15 11:15:11 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Feb 2021 11:15:11 GMT Subject: RFR: 8261336: IGV: enhance default filters [v2] In-Reply-To: References: Message-ID: > Redesign the filters shown by default in the "Filters" window: > > - Add filters to color the graph by node category and execution frequency (if applicable), and to hide subgraphs or edges only by _category_. The category of a node can be one of {`data`, `memory`, `control`, `mixed`, `other`}, and is solely determined by its type. `mixed` nodes are those with a tuple type that has different categories, such as `CallStaticJavaNode`. The category of an edge is that of its source node. > > - Instrument C2 to include the category and estimated execution frequency (if available) of each node in the graph dumps produced by `-XX:PrintIdealGraphLevel=N` (only in debug builds). > > - Remove filters which depend on properties never emitted by C2 (e.g. 'Remove State') or which appear to be unused ('C2 Matcher Flags Coloring' and 'C2 Register Coloring'). Also remove the subsumed 'C2 Basic Coloring' filter. > > - Merge 'C2 Remove Filter' and 'C2 Structural' into a single filter with a clearer name ('Simplify graph'). > > ### Screenshots > > "Filters" window before (left) and after (right) the proposed change: > ![filters-window](https://user-images.githubusercontent.com/8792647/107749859-7a664780-6d1b-11eb-84ba-fd43e13abd0e.png) > Default color scheme before (left) and after (right) the proposed change: > ![color-scheme](https://user-images.githubusercontent.com/8792647/107517355-f3479100-6bad-11eb-9b0b-a71c18961dd8.png) > Examples of the new 'Color by execution frequency' filter: > ![color-by-frequency-2](https://user-images.githubusercontent.com/8792647/107518492-5980e380-6baf-11eb-9e01-992b211d06e3.png) > ![color-by-frequency-1](https://user-images.githubusercontent.com/8792647/107518477-54bc2f80-6baf-11eb-8c7b-7eb7c1d85cf7.png) > Example of the new 'Hide X subgraph'' filters, where only the data subgraph is shown: > ![hide-all-but-data-subgraphs](https://user-images.githubusercontent.com/8792647/107750398-42133900-6d1c-11eb-8b27-264086b32bea.png) > Example of the new 'Hide X edges' filters, where all nodes remain in their position but only the memory edges are shown: > ![hide-all-but-memory-edges](https://user-images.githubusercontent.com/8792647/107751137-49871200-6d1d-11eb-9df0-79747bf8d16e.png) > > > Tested IGV manually on a few graphs. Tested C2 instrumentation by running `hs-tier1` with `-Xbatch -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=graph.xml` on windows-x64, linux-x64, linux-aarch64, and macosx-x64 (all debug). Roberto Casta?eda Lozano has updated the pull request incrementally with five additional commits since the last revision: - Rewrite 'Show control flow only' filter using categories - Add leading underscore field - Move assertion to a default switch case - Indent switch statements - Use a scoped enum for type categories (as per the HotSpot style guide) ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2499/files - new: https://git.openjdk.java.net/jdk/pull/2499/files/3ed1e3de..adcb9181 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2499&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2499&range=00-01 Stats: 105 lines in 5 files changed: 9 ins; 5 del; 91 mod Patch: https://git.openjdk.java.net/jdk/pull/2499.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2499/head:pull/2499 PR: https://git.openjdk.java.net/jdk/pull/2499 From rcastanedalo at openjdk.java.net Mon Feb 15 11:15:12 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Feb 2021 11:15:12 GMT Subject: RFR: 8261336: IGV: enhance default filters [v2] In-Reply-To: <0Fwnnxu5gJTYdq6zVX8pzH5U4H-daq9fFnSGX3hgV0E=.1d812d3d-cdcd-4c78-9130-bf2d7b795580@github.com> References: <0Fwnnxu5gJTYdq6zVX8pzH5U4H-daq9fFnSGX3hgV0E=.1d812d3d-cdcd-4c78-9130-bf2d7b795580@github.com> Message-ID: <7ttppJD2MhW_0CDz-JWjd_n8Cv5dVIvSm-YZPXUGmXY=.7241d924-d261-4f57-a289-9f0b2177a85b@github.com> On Fri, 12 Feb 2021 12:14:53 GMT, Christian Hagedorn wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with five additional commits since the last revision: >> >> - Rewrite 'Show control flow only' filter using categories >> - Add leading underscore field >> - Move assertion to a default switch case >> - Indent switch statements >> - Use a scoped enum for type categories (as per the HotSpot style guide) > > src/hotspot/share/opto/idealGraphPrinter.hpp line 95: > >> 93: bool _traverse_outs; >> 94: Compile *C; >> 95: double max_freq; > > Fields should have a leading underscore. Done. > src/hotspot/share/opto/type.cpp line 1120: > >> 1118: Type::CATEGORY Type::category() const { >> 1119: const TypeTuple* tuple; >> 1120: switch (base()) { > > Might be more readable if switch cases are indented. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/2499 From rcastanedalo at openjdk.java.net Mon Feb 15 11:17:42 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Feb 2021 11:17:42 GMT Subject: RFR: 8261336: IGV: enhance default filters [v2] In-Reply-To: <0Fwnnxu5gJTYdq6zVX8pzH5U4H-daq9fFnSGX3hgV0E=.1d812d3d-cdcd-4c78-9130-bf2d7b795580@github.com> References: <0Fwnnxu5gJTYdq6zVX8pzH5U4H-daq9fFnSGX3hgV0E=.1d812d3d-cdcd-4c78-9130-bf2d7b795580@github.com> Message-ID: On Fri, 12 Feb 2021 12:18:35 GMT, Christian Hagedorn wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with five additional commits since the last revision: >> >> - Rewrite 'Show control flow only' filter using categories >> - Add leading underscore field >> - Move assertion to a default switch case >> - Indent switch statements >> - Use a scoped enum for type categories (as per the HotSpot style guide) > > src/hotspot/share/opto/type.hpp line 374: > >> 372: CatControl, >> 373: CatOther, // {Type::Top, Type::Abio, Type::Bottom}. >> 374: CatUndef // {Type::Bad, Type::lastype}, for completeness. > > Is "Cat" required when the enum is already called "CATEGORY"? Right, it was required in a C-style unscoped-enum because of collisions with other enums in type.hpp. But I found in the HotSpot Style Guide that scoped-enums are actually encouraged, so I changed to that. > src/hotspot/share/opto/idealGraphPrinter.cpp line 394: > >> 392: } >> 393: >> 394: switch (t->category()) { > > Might be more readable if switch cases are indented. Done. > src/hotspot/share/opto/type.cpp line 1178: > >> 1176: } >> 1177: assert(false, "unmatched base type"); >> 1178: return CatUndef; > > You could add that to an explicit default case to the above switch statement to make it more clear. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/2499 From rcastanedalo at openjdk.java.net Mon Feb 15 11:23:40 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Feb 2021 11:23:40 GMT Subject: RFR: 8261336: IGV: enhance default filters [v2] In-Reply-To: <0Fwnnxu5gJTYdq6zVX8pzH5U4H-daq9fFnSGX3hgV0E=.1d812d3d-cdcd-4c78-9130-bf2d7b795580@github.com> References: <0Fwnnxu5gJTYdq6zVX8pzH5U4H-daq9fFnSGX3hgV0E=.1d812d3d-cdcd-4c78-9130-bf2d7b795580@github.com> Message-ID: On Fri, 12 Feb 2021 12:13:13 GMT, Christian Hagedorn wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with five additional commits since the last revision: >> >> - Rewrite 'Show control flow only' filter using categories >> - Add leading underscore field >> - Move assertion to a default switch case >> - Indent switch statements >> - Use a scoped enum for type categories (as per the HotSpot style guide) > > src/utils/IdealGraphVisualizer/ServerCompiler/src/com/sun/hotspot/igv/servercompiler/filters/onlyControlFlow.filter line 23: > >> 21: ) >> 22: ); >> 23: f.addRule(new RemoveFilter.RemoveRule(new MatcherSelector(new Properties.RegexpPropertyMatcher("name", "Phi|CreateEx|Cast.*|Load.|Store.")))); > > I tried the filter out on a graph and noticed that there are still some data and memory nodes shown. How about utilizing the newly added categories? I played around a bit and this seems to work quite nicely. What do you think? > > var f = new RemoveFilter("Show only control flow"); > f.addRule( > new RemoveFilter.RemoveRule( > new OrSelector( > new InvertSelector( > new MatcherSelector( > new Properties.RegexpPropertyMatcher("category", "control|mixed|other") > ) > ), > new MatcherSelector( > new Properties.StringPropertyMatcher("type", "abIO") > ) > ), false > ) > ); > f.addRule(new RemoveFilter.RemoveRule(new MatcherSelector(new Properties.RegexpPropertyMatcher("name", "Phi|Root|Con")))); > f.apply(graph); Thanks for the suggestion, I adopted it with some variations to avoid having to explicitly list pinned nodes like "Phi". ------------- PR: https://git.openjdk.java.net/jdk/pull/2499 From rcastanedalo at openjdk.java.net Mon Feb 15 11:23:38 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Feb 2021 11:23:38 GMT Subject: RFR: 8261336: IGV: enhance default filters [v2] In-Reply-To: References: <0Fwnnxu5gJTYdq6zVX8pzH5U4H-daq9fFnSGX3hgV0E=.1d812d3d-cdcd-4c78-9130-bf2d7b795580@github.com> Message-ID: On Fri, 12 Feb 2021 15:34:51 GMT, Roberto Casta?eda Lozano wrote: > > Otherwise, very nice improvement! I applied the patch and played around with it in the IGV. Everything seems to work as expected. I like the new default filters and the new colors for the new categories. That makes it much more useful. > > Thanks for reviewing Christian! I will have a look at your comments on Monday. I have addressed them now, please re-review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2499 From chagedorn at openjdk.java.net Mon Feb 15 13:57:39 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 15 Feb 2021 13:57:39 GMT Subject: RFR: 8261336: IGV: enhance default filters [v2] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 11:15:11 GMT, Roberto Casta?eda Lozano wrote: >> Redesign the filters shown by default in the "Filters" window: >> >> - Add filters to color the graph by node category and execution frequency (if applicable), and to hide subgraphs or edges only by _category_. The category of a node can be one of {`data`, `memory`, `control`, `mixed`, `other`}, and is solely determined by its type. `mixed` nodes are those with a tuple type that has different categories, such as `CallStaticJavaNode`. The category of an edge is that of its source node. >> >> - Instrument C2 to include the category and estimated execution frequency (if available) of each node in the graph dumps produced by `-XX:PrintIdealGraphLevel=N` (only in debug builds). >> >> - Remove filters which depend on properties never emitted by C2 (e.g. 'Remove State') or which appear to be unused ('C2 Matcher Flags Coloring' and 'C2 Register Coloring'). Also remove the subsumed 'C2 Basic Coloring' filter. >> >> - Merge 'C2 Remove Filter' and 'C2 Structural' into a single filter with a clearer name ('Simplify graph'). >> >> ### Screenshots >> >> "Filters" window before (left) and after (right) the proposed change: >> ![filters-window](https://user-images.githubusercontent.com/8792647/107749859-7a664780-6d1b-11eb-84ba-fd43e13abd0e.png) >> Default color scheme before (left) and after (right) the proposed change: >> ![color-scheme](https://user-images.githubusercontent.com/8792647/107517355-f3479100-6bad-11eb-9b0b-a71c18961dd8.png) >> Examples of the new 'Color by execution frequency' filter: >> ![color-by-frequency-2](https://user-images.githubusercontent.com/8792647/107518492-5980e380-6baf-11eb-9e01-992b211d06e3.png) >> ![color-by-frequency-1](https://user-images.githubusercontent.com/8792647/107518477-54bc2f80-6baf-11eb-8c7b-7eb7c1d85cf7.png) >> Example of the new 'Hide X subgraph'' filters, where only the data subgraph is shown: >> ![hide-all-but-data-subgraphs](https://user-images.githubusercontent.com/8792647/107750398-42133900-6d1c-11eb-8b27-264086b32bea.png) >> Example of the new 'Hide X edges' filters, where all nodes remain in their position but only the memory edges are shown: >> ![hide-all-but-memory-edges](https://user-images.githubusercontent.com/8792647/107751137-49871200-6d1d-11eb-9df0-79747bf8d16e.png) >> >> >> Tested IGV manually on a few graphs. Tested C2 instrumentation by running `hs-tier1` with `-Xbatch -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=graph.xml` on windows-x64, linux-x64, linux-aarch64, and macosx-x64 (all debug). > > Roberto Casta?eda Lozano has updated the pull request incrementally with five additional commits since the last revision: > > - Rewrite 'Show control flow only' filter using categories > - Add leading underscore field > - Move assertion to a default switch case > - Indent switch statements > - Use a scoped enum for type categories (as per the HotSpot style guide) Thanks for addressing my suggestions. I tested it again - looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2499 From chagedorn at openjdk.java.net Mon Feb 15 13:57:41 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 15 Feb 2021 13:57:41 GMT Subject: RFR: 8261336: IGV: enhance default filters [v2] In-Reply-To: References: <0Fwnnxu5gJTYdq6zVX8pzH5U4H-daq9fFnSGX3hgV0E=.1d812d3d-cdcd-4c78-9130-bf2d7b795580@github.com> Message-ID: On Mon, 15 Feb 2021 11:19:04 GMT, Roberto Casta?eda Lozano wrote: >> src/utils/IdealGraphVisualizer/ServerCompiler/src/com/sun/hotspot/igv/servercompiler/filters/onlyControlFlow.filter line 23: >> >>> 21: ) >>> 22: ); >>> 23: f.addRule(new RemoveFilter.RemoveRule(new MatcherSelector(new Properties.RegexpPropertyMatcher("name", "Phi|CreateEx|Cast.*|Load.|Store.")))); >> >> I tried the filter out on a graph and noticed that there are still some data and memory nodes shown. How about utilizing the newly added categories? I played around a bit and this seems to work quite nicely. What do you think? >> >> var f = new RemoveFilter("Show only control flow"); >> f.addRule( >> new RemoveFilter.RemoveRule( >> new OrSelector( >> new InvertSelector( >> new MatcherSelector( >> new Properties.RegexpPropertyMatcher("category", "control|mixed|other") >> ) >> ), >> new MatcherSelector( >> new Properties.StringPropertyMatcher("type", "abIO") >> ) >> ), false >> ) >> ); >> f.addRule(new RemoveFilter.RemoveRule(new MatcherSelector(new Properties.RegexpPropertyMatcher("name", "Phi|Root|Con")))); >> f.apply(graph); > > Thanks for the suggestion, I adopted it with some variations to avoid having to explicitly list pinned nodes like "Phi". That's even better! Thanks for adopting that suggestion. ------------- PR: https://git.openjdk.java.net/jdk/pull/2499 From lucy at openjdk.java.net Mon Feb 15 17:47:38 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Mon, 15 Feb 2021 17:47:38 GMT Subject: RFR: 8261657: [PPC64] Cleanup StoreCM nodes after CMS removal In-Reply-To: References: Message-ID: <9llRP8rEfThe6mIPc57HbANvm7iyeyB2zJ73FNbWbNo=.1b07a60c-7722-4db4-bae8-4629ee3cba07@github.com> On Fri, 12 Feb 2021 16:29:36 GMT, Martin Doerr wrote: > We only need one StoreCM node after CMS removal. CMS StoreStore barriers were already removed at other places. Changes look good to me. Thanks for cleaning up! ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2554 From jbhateja at openjdk.java.net Mon Feb 15 18:55:03 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 15 Feb 2021 18:55:03 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v3] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 05:59:01 GMT, Jatin Bhateja wrote: >>> Hi Claes, This could be a run to run variation, in general we are now having fewer number of instructions (one shift operation saved per mask computation) compared to previous masked generation sequence and thus it will always offer better execution latencies. >> >> Run-to-run variation would be easy to rule out by running more forks and more iterations to attain statistically significant results. While the instruction manuals suggest latency should be better for this instruction on all CPUs where it's supported, it would be good if there was some clear proof - such as a significant benchmark win - to motivate the added complexity. > >> > Hi Claes, This could be a run to run variation, in general we are now having fewer number of instructions (one shift operation saved per mask computation) compared to previous masked generation sequence and thus it will always offer better execution latencies. >> >> Run-to-run variation would be easy to rule out by running more forks and more iterations to attain statistically significant results. While the instruction manuals suggest latency should be better for this instruction on all CPUs where it's supported, it would be good if there was some clear proof - such as a significant benchmark win - to motivate the added complexity. > > BASELINE: > Result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong": > 61.037 ns/op > > Secondary result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong:??perf": > Perf stats: > -------------------------------------------------- > > 19,739.21 msec task-clock # 0.389 CPUs utilized > 646 context-switches # 0.033 K/sec > 12 cpu-migrations # 0.001 K/sec > 150 page-faults # 0.008 K/sec > 74,59,83,59,139 cycles # 3.779 GHz (30.73%) > 1,78,78,79,19,117 instructions # 2.40 insn per cycle (38.48%) > 24,79,81,63,651 branches # 1256.289 M/sec (38.55%) > 32,24,89,924 branch-misses # 1.30% of all branches (38.62%) > 52,56,88,28,472 L1-dcache-loads # 2663.167 M/sec (38.65%) > 39,00,969 L1-dcache-load-misses # 0.01% of all L1-dcache hits (38.57%) > 3,74,131 LLC-loads # 0.019 M/sec (30.77%) > 22,315 LLC-load-misses # 5.96% of all LL-cache hits (30.72%) > L1-icache-loads > 17,49,997 L1-icache-load-misses (30.72%) > 52,91,41,70,636 dTLB-loads # 2680.663 M/sec (30.69%) > 3,315 dTLB-load-misses # 0.00% of all dTLB cache hits (30.67%) > 4,674 iTLB-loads # 0.237 K/sec (30.65%) > 33,746 iTLB-load-misses # 721.99% of all iTLB cache hits (30.63%) > L1-dcache-prefetches > L1-dcache-prefetch-misses > > 50.723759146 seconds time elapsed > > 51.447054000 seconds user > 0.189949000 seconds sys > > > WITH OPT: > Result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong": > 74.356 ns/op > > Secondary result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong:??perf": > Perf stats: > -------------------------------------------------- > > 19,741.09 msec task-clock # 0.389 CPUs utilized > 641 context-switches # 0.032 K/sec > 17 cpu-migrations # 0.001 K/sec > 164 page-faults # 0.008 K/sec > 74,40,40,48,513 cycles # 3.769 GHz (30.81%) > 1,45,66,22,06,797 instructions # 1.96 insn per cycle (38.56%) > 20,31,28,43,577 branches # 1028.963 M/sec (38.65%) > 14,11,419 branch-misses # 0.01% of all branches (38.69%) > 43,07,86,33,662 L1-dcache-loads # 2182.182 M/sec (38.72%) > 37,06,744 L1-dcache-load-misses # 0.01% of all L1-dcache hits (38.56%) > 1,34,292 LLC-loads # 0.007 M/sec (30.72%) > 30,627 LLC-load-misses # 22.81% of all LL-cache hits (30.68%) > L1-icache-loads > 14,49,145 L1-icache-load-misses (30.65%) > 43,44,86,27,516 dTLB-loads # 2200.924 M/sec (30.63%) > 218 dTLB-load-misses # 0.00% of all dTLB cache hits (30.63%) > 2,445 iTLB-loads # 0.124 K/sec (30.63%) > 28,624 iTLB-load-misses # 1170.72% of all iTLB cache hits (30.63%) > L1-dcache-prefetches > L1-dcache-prefetch-misses > > 50.716083931 seconds time elapsed > > 51.467300000 seconds user > 0.200390000 seconds sys > > > JMH perf data for ArrayCopyUnalignedSrc.testLong with copy length of 1200 shows degradation in LID accesses, it seems the benchmask got displaced from its sweet spot. > > But, there is a significant reduction in instruction count and cycles are almost comparable. We are saving one shift per mask computation. > > OLD Sequence: > 0x00007f7fc1030ead: movabs $0x1,%rax > 0x00007f7fc1030eb7: shlx %r8,%rax,%rax > 0x00007f7fc1030ebc: dec %rax > 0x00007f7fc1030ebf: kmovq %rax,%k2 > NEW Sequence: > 0x00007f775d030d51: movabs $0xffffffffffffffff,%rax > 0x00007f775d030d5b: bzhi %r8,%rax,%rax > 0x00007f775d030d60: kmovq %rax,%k2 Further analysis of perf degradation revealed that with new optimized instruction pattern, code alignment got disturbed. This led to increase in LSD misses, also it reduced the UOPs cashing in DSB. Aligning copy loops at 32 byte boundary prevents any adverse impact on UOP caching. NOPs used for padding add up to the instruction count and thus may over shadow the code size gains due to new mask generation sequence in copy stubs. Baseline: ArrayCopyAligned.testLong Length : 1200 61 ns/op (approx) 1,93,44,43,11,622 cycles 4,59,57,99,78,727 instructions # 2.38 insn per cycle 1,83,68,75,68,255 idq.dsb_uops 2,08,32,43,71,906 lsd.uops 37,12,54,60,211 idq.mite_uops With Opt: ArrayCopyAligned.testLong Length : 1200 74 ns/op (approx) 1,93,51,25,94,766 cycles 3,75,11,57,91,917 instructions # 1.94 insn per cycle 48,67,58,25,566 idq.dsb_uops 19,46,13,236 lsd.uops 2,87,42,95,74,280 idq.mite_uops With Opt + main loop alignment(nop): 61 ns/op (approx) ArrayCopyAligned.testLong Length : 1200 1,93,52,15,90,080 cycles 4,60,89,14,06,528 instructions # 2.38 insn per cycle 1,78,76,10,34,991 idq.dsb_uops 2,09,16,15,84,313 lsd.uops 46,25,31,92,101 idq.mite_uops While computing the mask for partial in-lining of small copy calls ( currently enabled for sub-word types with copy length less than 32/64 bytes), new optimized sequence should always offer lower instruction count and latency path. Baseline: ArrayCopyAligned.testByte Length : 20 avgt 2 2.635 ns/op 1,97,76,75,18,052 cycles 8,96,00,37,11,803 instructions # 4.53 insn per cycle 2,71,83,79,035 idq.dsb_uops 7,54,82,43,63,409 lsd.uops 3,92,55,74,395 idq.mite_uops ArrayCopyAligned.testByte Length : 31 avgt 2 2.635 ns/op 1,97,79,16,56,787 cycles 8,96,13,15,69,780 instructions # 4.53 insn per cycle 2,69,07,11,691 idq.dsb_uops 7,54,95,63,77,683 lsd.uops 3,90,19,10,747 idq.mite_uops WithOpt: ArrayCopyAligned.testByte Length : 20 avgt 2 2.635 ns/op 1,97,66,64,62,541 cycles 8,92,03,95,00,236 instructions # 4.51 insn per cycle 2,72,38,56,205 idq.dsb_uops 7,50,87,50,60,591 lsd.uops 3,89,15,02,954 idq.mite_uops ArrayCopyAligned.testByte Length : 31 avgt 2 2.635 ns/op 1,97,54,21,61,110 cycles 8,91,46,64,23,754 instructions # 4.51 insn per cycle 2,78,12,19,544 idq.dsb_uops 7,50,35,88,95,843 lsd.uops 3,90,41,97,276 idq.mite_uops Following are the links to updated JMH perf data: http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_BASELINE.txt http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_WITH_OPTS_LOOP_ALIGN.txt In general gains are not significant in case of copy stubs, but new sequence offers a optimal latency path for mask computation sequence. ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From jbhateja at openjdk.java.net Mon Feb 15 18:55:02 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 15 Feb 2021 18:55:02 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v3] In-Reply-To: References: Message-ID: > BMI2 BHZI instruction can be used to optimize the instruction sequence > used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8261553 : Aligning main copy loop to prevent any penalty due to LSD and DSB misses. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2522/files - new: https://git.openjdk.java.net/jdk/pull/2522/files/84c9c2da..7012eed0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2522&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2522&range=01-02 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2522.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2522/head:pull/2522 PR: https://git.openjdk.java.net/jdk/pull/2522 From xliu at openjdk.java.net Mon Feb 15 19:00:57 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 15 Feb 2021 19:00:57 GMT Subject: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object [v2] In-Reply-To: References: Message-ID: <3JNae6rXuxc_Q6YoALCH8Ku510Zne5ftqf1z8OCGkHQ=.2ebf5cd0-27e2-44e3-adf7-065179cc9ffd@github.com> > There are 3 nodes involving in the construction of a java.lang.String object. > 1. Allocate of itself, aka. alloc > 2. AllocateArray of a byte array, which is value:byte[], aka. aa > 3. ArrayCopyNode which copys in the contents of value, aka. ac > > Lemma > When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`. > > Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well. > > It is possible to rewire `aa` to the source of ac with the correct offset. That is to say, we can replace both `aa` and `ac` with a ?shallow copy? of the source of `ac`. It?s safe if C2 keeps a reference of the source oop for all safepoints. Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 36 commits: - Merge branch 'master' into optimize_substring - fix regression for x86-32 if LP64 is off, the offset of AddP must be I instead of L. x86 also doesn't emit encodeP/storeN. it use storeP instead. - add a statistical counter for OptimizeTempArray. -XX:+PrintOptoStatistics shows it - [SIM-JVM-450] support deoptimization v2 because the src oop of scobj may be another scobj, deoptimization sort all objects in topological order. separate creation of dst oop and reassignment of it. - add a unit test for deoptimization - [SIM-JVM-450] support deoptimization part2 if OptimizeTempArray eliminates an AllocateArrayNode, scalar replacement will create a nested SafePointScalarObjectNode for the field value:byte[] of j.l.String. we use the nested sobj and an ObjectValue an envelope. it consists of 3 fields: 1. src 2. src_positio 3. length. deoptimizaton recognizes this ad-hoc ObjectValue and re-allocate an arrayOop for the String object. - enable OptimizeTempArray by default - Merge branch 'master' into optimize_substring - Revert "8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set" This reverts commit a49e34688d7d7c9d3c0d9c824d33f359613c2fc1. - Revert "add a new bucket afterea_late_inlines" afterea_late_inlines bucket is not useful. revert it and its relevant changes - ... and 26 more: https://git.openjdk.java.net/jdk/compare/849f4c0f...21693ddd ------------- Changes: https://git.openjdk.java.net/jdk/pull/2570/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2570&range=01 Stats: 861 lines in 16 files changed: 844 ins; 2 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/2570.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2570/head:pull/2570 PR: https://git.openjdk.java.net/jdk/pull/2570 From mdoerr at openjdk.java.net Mon Feb 15 19:47:54 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 15 Feb 2021 19:47:54 GMT Subject: RFR: 8261522: [PPC64] AES intrinsics write beyond the destination array [v2] In-Reply-To: References: Message-ID: > I'd like to replace the read-modify-write implementation from aescrypt_encryptBlock / aescrypt_decryptBlock stubs. It can cause severe problems (see bug description). Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add comment and key length assertions. Minor improvement. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2514/files - new: https://git.openjdk.java.net/jdk/pull/2514/files/d933c2f7..725dd8c7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2514&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2514&range=00-01 Stats: 42 lines in 1 file changed: 26 ins; 8 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2514.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2514/head:pull/2514 PR: https://git.openjdk.java.net/jdk/pull/2514 From nhe at activeviam.com Mon Feb 15 13:19:05 2021 From: nhe at activeviam.com (Nicolas Heutte) Date: Mon, 15 Feb 2021 14:19:05 +0100 Subject: [External] : Re: SuperWord loop optimization lost after method inlining In-Reply-To: <87dcb670-622f-6b05-7c06-289f9b4f3634@oracle.com> References: <87dcb670-622f-6b05-7c06-289f9b4f3634@oracle.com> Message-ID: Hi Vladimir, I've tried disabling tiered compilation, as you requested. It seems that the inlining was performed slightly differently, but the issue remains. As you can see in this excerpt, the main loop isn't properly vectorized: 0x00000254b0d4bf54: cmp %r11d,%r8d 0x00000254b0d4bf57: jae 0x00000254b0d4c19e 0x00000254b0d4bf5d: vmovss 0x10(%rcx,%r8,4),%xmm9 ;*faload {reexecute=0 rethrow=0 return_oop=0} ; - com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 (line 41) ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) ; - com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) ; - com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 (line 118) 0x00000254b0d4bf64: cmp %ebx,%r8d 0x00000254b0d4bf67: jae 0x00000254b0d4c1ec 0x00000254b0d4bf6d: vaddss 0x10(%rdi,%r8,4),%xmm9,%xmm9 0x00000254b0d4bf74: vmovss %xmm9,0x10(%rcx,%r8,4) ;*fastore {reexecute=0 rethrow=0 return_oop=0} ; - com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 (line 41) ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) ; - com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) ; - com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 (line 118) 0x00000254b0d4bf7b: inc %r8d ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 (line 40) ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) ; - com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) ; - com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 (line 118) 0x00000254b0d4bf7e: cmp %r9d,%r8d 0x00000254b0d4bf81: jl 0x00000254b0d4bf54 ;*goto {reexecute=0 rethrow=0 return_oop=0} ; - com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 (line 40) ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) ; - com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) ; - com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 (line 118) Here is the link to the full log, should you want to take a look at it: https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing Best regards, Nicolas Heutte On Thu, Feb 11, 2021 at 7:05 PM Vladimir Kozlov wrote: > Changing wide mailing list to JIT compiler only. > > This deoptimization is normal in Tiered Compilation - it switched from > profiling code (level='3') generated by C1 > compiler to new code generated by C2 (level='4') which does loop > optimizations. > > Thank you for posting inlining information: > > @ 17 > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 > bytes) inline (hot) > \-> TypeProfile (14054/14054 counts) = > com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > > I thought before that may be call site is not hot but it is not the case. > > You can do an other experiment to collect log with disabled Tiered > Compilation (only C2 is used): -XX:-TieredCompilation > Also print assembler code (as you did before) for final compilation to see > if loop is still not vectorized. > > Is it possible to post log file (on GitHub?) for me to look? > > Thanks, > Vladimir K > > On 2/11/21 6:28 AM, Nicolas Heutte wrote: > > Hi Vladimir, > > > > Thank you for your help. > > > > I'm currently running Java 11.0.9, and I did not use any VM flag of note. > > > > I checked the content of the compilation log, and it seems that > ArrayFloatToArrayFloatVectorBinding::plus() was > > deoptimized in order to allow AVector::plus() to be compiled: > > > > > > > count='916' iicount='916' level='3' stamp='7394.056' comment='tiered' > hot_count='896'/> > > > > compile_id='17257' compiler='c1' level='3'> > > method='com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding > plus > > (Lcom/qfs/vector/IVector;Lcom/qfs/vector/IVector;)V' bytes='69' > count='909' backedge_count='155602' iicount='910'/> > > > > > > The last compilation entry for AVector::plus() is: > > > > > > entry='0x00000296d6af32c0' size='1960' address='0x00000296d6af3110' > > relocation_offset='376' insts_offset='432' stub_offset='1040' > scopes_data_offset='1152' scopes_pcs_offset='1592' > > dependencies_offset='1880' nul_chk_table_offset='1896' > oops_offset='1064' metadata_offset='1072' > > method='com.qfs.vector.impl.AVector plus (Lcom/qfs/vector/IVector;)V' > bytes='23' count='172425' iicount='172425' > > stamp='7394.199'/> > > level='2' stamp='7394.199'/> > > @ 1 > com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) inline > (hot) > > \-> TypeProfile (14552/14552 counts) > = com/qfs/vector/array/impl/ArrayFloatVector > > @ 7 > com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) inline > (hot) > > \-> TypeProfile (14150/14150 counts) > = com/qfs/vector/array/impl/ArrayFloatVector > > @ 10 > com.qfs.vector.binding.impl.VectorBindings::getBinding (9 bytes) inline > (hot) > > @ 5 > com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::getBinding > (22 > > bytes) inline (hot) > > @ 3 > com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::hasBinding > > > (34 bytes) inline (hot) > > @ 17 > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 > bytes) > > inline (hot) > > \-> TypeProfile (14054/14054 counts) > = > > com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > > @ 12 > com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) inline (hot) > > @ 22 > com.qfs.vector.impl.AVector::checkIndex (37 bytes) inline (hot) > > @ 6 > com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) inline (hot) > > @ 27 > com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) > accessor > > @ 34 > com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) > accessor > > > > > > Unfortunately, I do not have access to a debug VM build, so I cannot run > the second test you recommend. > > > > Best regards, > > Nicolas Heutte > > > > On Wed, Feb 10, 2021 at 7:36 PM Vladimir Kozlov < > vladimir.kozlov at oracle.com > wrote: > > > > Hi, Nicolas > > > > Looks like, when inlined, the loop from > ArrayFloatToArrayFloatVectorBinding::plus() was not optimized at all: it is > not > > unrolled and has range checks. Such loops are not vectorized (you > need unrolling and no checks). > > > > What Java version you are running? What HotSpot VM flags you are > using when running application? > > > > Run application with -XX:+LogCompilation and look on compilation > data in hotspot_pid.log file for caller > > AVector::plus(). > > > > VM also has several flags to trace loop optimizations but they are > only available in debug VM build. If you have access > > to such build run with -XX:+PrintCompilation -XX:+TraceLoopOpts > flags. > > > > Thanks, > > Vladimir K > > > > On 2/10/21 9:24 AM, Nicolas Heutte wrote: > > > Hi all, > > > > > > I am encountering a performance issue caused by the interaction > between > > > method inlining and automatic vectorization. > > > > > > Our application aggregates arrays intensively using a method named > > > ArrayFloatToArrayFloatVectorBinding.plus() with the following > code: > > > > > > for (int i = 0; i < srcLen; ++i) { > > > > > > dstArray[i] += srcArray[i]; > > > > > > } > > > > > > When we microbenchmark this method we observe fast performance > close to the > > > practical memory bandwidth and when we print the assembly code we > observe > > > loop unrolling and automatic vectorization with SIMD instructions. > > > > > > 0x000001ef4600abf0: vmovdqu 0x10(%r14,%r13,4),%ymm0 > > > > > > 0x000001ef4600abf7: vaddps 0x10(%rcx,%r13,4),%ymm0,%ymm0 > > > > > > 0x000001ef4600abfe: vmovdqu %ymm0,0x10(%r14,%r13,4) > > > > > > 0x000001ef4600ac05: movslq %r13d,%r11 > > > > > > 0x000001ef4600ac08: vmovdqu 0x30(%r14,%r11,4),%ymm0 > > > > > > 0x000001ef4600ac0f: vaddps 0x30(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > 0x000001ef4600ac16: vmovdqu %ymm0,0x30(%r14,%r11,4) > > > > > > 0x000001ef4600ac1d: vmovdqu 0x50(%r14,%r11,4),%ymm0 > > > > > > 0x000001ef4600ac24: vaddps 0x50(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > 0x000001ef4600ac2b: vmovdqu %ymm0,0x50(%r14,%r11,4) > > > > > > 0x000001ef4600ac32: vmovdqu 0x70(%r14,%r11,4),%ymm0 > > > > > > 0x000001ef4600ac39: vaddps 0x70(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > 0x000001ef4600ac40: vmovdqu %ymm0,0x70(%r14,%r11,4) > > > > > > 0x000001ef4600ac47: vmovdqu 0x90(%r14,%r11,4),%ymm0 > > > > > > 0x000001ef4600ac51: vaddps 0x90(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > 0x000001ef4600ac5b: vmovdqu %ymm0,0x90(%r14,%r11,4) > > > > > > 0x000001ef4600ac65: vmovdqu 0xb0(%r14,%r11,4),%ymm0 > > > > > > 0x000001ef4600ac6f: vaddps 0xb0(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > 0x000001ef4600ac79: vmovdqu %ymm0,0xb0(%r14,%r11,4) > > > > > > 0x000001ef4600ac83: vmovdqu 0xd0(%r14,%r11,4),%ymm0 > > > > > > 0x000001ef4600ac8d: vaddps 0xd0(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > 0x000001ef4600ac97: vmovdqu %ymm0,0xd0(%r14,%r11,4) > > > > > > 0x000001ef4600aca1: vmovdqu 0xf0(%r14,%r11,4),%ymm0 > > > > > > 0x000001ef4600acab: vaddps 0xf0(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > 0x000001ef4600acb5: vmovdqu %ymm0,0xf0(%r14,%r11,4) ;*fastore > > > {reexecute=0 rethrow=0 return_oop=0} > > > > > > ; - > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > > > (line 41) > > > > > > 0x000001ef4600acbf: add $0x40,%r13d ;*iinc > {reexecute=0 > > > rethrow=0 return_oop=0} > > > > > > ; - > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > > > (line 40) > > > > > > 0x000001ef4600acc3: cmp %eax,%r13d > > > > > > 0x000001ef4600acc6: jl 0x000001ef4600abf0 ;*goto > {reexecute=0 > > > rethrow=0 return_oop=0} > > > > > > ; - > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > > > (line 40) > > > > > > > > > > > > In the real application, this method is actually inlined in a > higher level > > > method named AVector.plus(). Unfortunately, the inlined version > of the > > > aggregation code is not vectorized anymore: > > > > > > > > > > > > 0x000001ef460180a0: cmp %ebx,%r11d > > > > > > 0x000001ef460180a3: jae 0x000001ef460180e6 > > > > > > 0x000001ef460180a5: vmovss 0x10(%r8,%r11,4),%xmm1 ;*faload > {reexecute=0 > > > rethrow=0 return_oop=0} > > > > > > ; - > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 > > > (line 41) > > > > > > ; - > > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > 0x000001ef460180ac: cmp %ecx,%r11d > > > > > > 0x000001ef460180af: jae 0x000001ef46018104 > > > > > > 0x000001ef460180b1: vaddss 0x10(%r9,%r11,4),%xmm1,%xmm1 > > > > > > 0x000001ef460180b8: vmovss %xmm1,0x10(%r8,%r11,4) ;*fastore > {reexecute=0 > > > rethrow=0 return_oop=0} > > > > > > ; - > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > > > (line 41) > > > > > > ; - > > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > 0x000001ef460180bf: inc %r11d ;*iinc > {reexecute=0 > > > rethrow=0 return_oop=0} > > > > > > ; - > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > > > (line 40) > > > > > > ; - > > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > 0x000001ef460180c2: cmp %r10d,%r11d > > > > > > 0x000001ef460180c5: jl 0x000001ef460180a0 ;*goto > {reexecute=0 > > > rethrow=0 return_oop=0} > > > > > > ; - > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > > > (line 40) > > > > > > ; - > > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > > > > > > > This causes a significant performance drop, compared to a run > where we > > > explicitly disable the inlining and observe automatically > vectorized code > > > again ( > > > > -XX:CompileCommand=dontinline,com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding.plus > > > ). > > > > > > > > > How do you guys explain that behavior of the JIT compiler? Is > this a known > > > and tracked issue, could it be fixed in the JVM? Can we do > something in the > > > java code to prevent this from happening? > > > > > > > > > Best regards, > > > > > > Nicolas Heutte > > > > > > From duke at openjdk.java.net Tue Feb 16 01:29:44 2021 From: duke at openjdk.java.net (duke) Date: Tue, 16 Feb 2021 01:29:44 GMT Subject: Withdrawn: 8253757: Add LLVM-based backend for hsdis In-Reply-To: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> Message-ID: On Tue, 29 Sep 2020 04:36:16 GMT, Ludovic Henry wrote: > When bringing up Hotspot onto new platforms, it is not always possible to compile hsdis because gcc is not yet available. For example, for Windows-AArch64 and macOS-AArch64. > > For some such platforms, it is possible to use LLVM as an alternative backend as it also supports a disassembler feature. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From rrich at openjdk.java.net Tue Feb 16 08:17:43 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Tue, 16 Feb 2021 08:17:43 GMT Subject: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 09:22:00 GMT, Xin Liu wrote: > There are 3 nodes involving in the construction of a java.lang.String object. > 1. Allocate of itself, aka. alloc > 2. AllocateArray of a byte array, which is value:byte[], aka. aa > 3. ArrayCopyNode which copys in the contents of value, aka. ac > > Lemma > When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`. > > Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well. > > It is possible to rewire `aa` to the source of ac with the correct offset. That is to say, we can replace both `aa` and `ac` with a ?shallow copy? of the source of `ac`. It?s safe if C2 keeps a reference of the source oop for all safepoints. Hi, this is a smart optimization. > > > There are 3 nodes involving in the construction of a java.lang.String object. > > 1. Allocate of itself, aka. alloc > > 2. AllocateArray of a byte array, which is value:byte[], aka. aa > > 3. ArrayCopyNode which copys in the contents of value, aka. ac > > > Lemma > When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`. > > Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well. Are you saying that the source of `ac` cannot be accessed by another thread because of the cloning in the constructor? But the resulting string instance which is used to construct the non-escape instance can be GlobalEscape and then the source of `ac` is accessible to other threads, isn't it? ------------- PR: https://git.openjdk.java.net/jdk/pull/2570 From thartmann at openjdk.java.net Tue Feb 16 11:35:55 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 16 Feb 2021 11:35:55 GMT Subject: RFR: 8261336: IGV: enhance default filters [v2] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 11:15:11 GMT, Roberto Casta?eda Lozano wrote: >> Redesign the filters shown by default in the "Filters" window: >> >> - Add filters to color the graph by node category and execution frequency (if applicable), and to hide subgraphs or edges only by _category_. The category of a node can be one of {`data`, `memory`, `control`, `mixed`, `other`}, and is solely determined by its type. `mixed` nodes are those with a tuple type that has different categories, such as `CallStaticJavaNode`. The category of an edge is that of its source node. >> >> - Instrument C2 to include the category and estimated execution frequency (if available) of each node in the graph dumps produced by `-XX:PrintIdealGraphLevel=N` (only in debug builds). >> >> - Remove filters which depend on properties never emitted by C2 (e.g. 'Remove State') or which appear to be unused ('C2 Matcher Flags Coloring' and 'C2 Register Coloring'). Also remove the subsumed 'C2 Basic Coloring' filter. >> >> - Merge 'C2 Remove Filter' and 'C2 Structural' into a single filter with a clearer name ('Simplify graph'). >> >> ### Screenshots >> >> "Filters" window before (left) and after (right) the proposed change: >> ![filters-window](https://user-images.githubusercontent.com/8792647/107749859-7a664780-6d1b-11eb-84ba-fd43e13abd0e.png) >> Default color scheme before (left) and after (right) the proposed change: >> ![color-scheme](https://user-images.githubusercontent.com/8792647/107517355-f3479100-6bad-11eb-9b0b-a71c18961dd8.png) >> Examples of the new 'Color by execution frequency' filter: >> ![color-by-frequency-2](https://user-images.githubusercontent.com/8792647/107518492-5980e380-6baf-11eb-9e01-992b211d06e3.png) >> ![color-by-frequency-1](https://user-images.githubusercontent.com/8792647/107518477-54bc2f80-6baf-11eb-8c7b-7eb7c1d85cf7.png) >> Example of the new 'Hide X subgraph'' filters, where only the data subgraph is shown: >> ![hide-all-but-data-subgraphs](https://user-images.githubusercontent.com/8792647/107750398-42133900-6d1c-11eb-8b27-264086b32bea.png) >> Example of the new 'Hide X edges' filters, where all nodes remain in their position but only the memory edges are shown: >> ![hide-all-but-memory-edges](https://user-images.githubusercontent.com/8792647/107751137-49871200-6d1d-11eb-9df0-79747bf8d16e.png) >> >> >> Tested IGV manually on a few graphs. Tested C2 instrumentation by running `hs-tier1` with `-Xbatch -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=graph.xml` on windows-x64, linux-x64, linux-aarch64, and macosx-x64 (all debug). > > Roberto Casta?eda Lozano has updated the pull request incrementally with five additional commits since the last revision: > > - Rewrite 'Show control flow only' filter using categories > - Add leading underscore field > - Move assertion to a default switch case > - Indent switch statements > - Use a scoped enum for type categories (as per the HotSpot style guide) This is awesome. Thanks a lot for taking the time to improve the IGV. Looks good to me! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2499 From chagedorn at openjdk.java.net Tue Feb 16 11:36:14 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 16 Feb 2021 11:36:14 GMT Subject: RFR: 8261235: C1 compilation fails with assert(res->vreg_number() == index) failed: conversion check [v2] In-Reply-To: <-uBsLA86fiUezxtSxF_GW9oT62EezgztzXPwSta8EAw=.cb63031d-3ab4-468c-9351-1f1e10c377a9@github.com> References: <-uBsLA86fiUezxtSxF_GW9oT62EezgztzXPwSta8EAw=.cb63031d-3ab4-468c-9351-1f1e10c377a9@github.com> Message-ID: > The assertion is hit because we run out of virtual registers in the linear scan in C1 and do not handle it. I fixed it by applying the same bailout as in `LIRGenerator::new_register()`. > > There is also a second issue that `LIR_OprDesc::vreg_max` is too big. It is only used in this bailout code. `OprBits::vreg_max` is defined over `OprBits::data_bits` which uses `OprBits::non_data_bits`. But `OprBits::non_data_bits` does not consider `OprBits::pointer_bits` which results in a too large value for `LIR_OprDesc::vreg_max` and the assertion is hit because we don't bail out, yet. This needs to be fixed as well. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Fix order of non_data_bits ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2543/files - new: https://git.openjdk.java.net/jdk/pull/2543/files/5fd1b911..82df324a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2543&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2543&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2543.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2543/head:pull/2543 PR: https://git.openjdk.java.net/jdk/pull/2543 From thartmann at openjdk.java.net Tue Feb 16 11:36:15 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 16 Feb 2021 11:36:15 GMT Subject: RFR: 8261235: C1 compilation fails with assert(res->vreg_number() == index) failed: conversion check [v2] In-Reply-To: References: <-uBsLA86fiUezxtSxF_GW9oT62EezgztzXPwSta8EAw=.cb63031d-3ab4-468c-9351-1f1e10c377a9@github.com> Message-ID: <8KAbnjufy-zo-KSGC3R8TebVYBQeAL7-M4u32Fe_bSQ=.fbc29b15-c1ff-4270-b5d4-3d0687cf4031@github.com> On Tue, 16 Feb 2021 11:07:14 GMT, Christian Hagedorn wrote: >> The assertion is hit because we run out of virtual registers in the linear scan in C1 and do not handle it. I fixed it by applying the same bailout as in `LIRGenerator::new_register()`. >> >> There is also a second issue that `LIR_OprDesc::vreg_max` is too big. It is only used in this bailout code. `OprBits::vreg_max` is defined over `OprBits::data_bits` which uses `OprBits::non_data_bits`. But `OprBits::non_data_bits` does not consider `OprBits::pointer_bits` which results in a too large value for `LIR_OprDesc::vreg_max` and the assertion is hit because we don't bail out, yet. This needs to be fixed as well. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix order of non_data_bits Looks good to me. src/hotspot/share/c1/c1_LIR.hpp line 235: > 233: , is_fpu_stack_offset_bits = 1 // used in assertion checking on x86 for FPU stack slot allocation > 234: , non_data_bits = kind_bits + type_bits + size_bits + destroys_bits + last_use_bits + > 235: is_fpu_stack_offset_bits + virtual_bits + is_xmm_bits + pointer_bits Would be nice to have this in the same order as the enum values. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2543 From chagedorn at openjdk.java.net Tue Feb 16 11:36:15 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 16 Feb 2021 11:36:15 GMT Subject: RFR: 8261235: C1 compilation fails with assert(res->vreg_number() == index) failed: conversion check [v2] In-Reply-To: <8KAbnjufy-zo-KSGC3R8TebVYBQeAL7-M4u32Fe_bSQ=.fbc29b15-c1ff-4270-b5d4-3d0687cf4031@github.com> References: <-uBsLA86fiUezxtSxF_GW9oT62EezgztzXPwSta8EAw=.cb63031d-3ab4-468c-9351-1f1e10c377a9@github.com> <8KAbnjufy-zo-KSGC3R8TebVYBQeAL7-M4u32Fe_bSQ=.fbc29b15-c1ff-4270-b5d4-3d0687cf4031@github.com> Message-ID: On Tue, 16 Feb 2021 10:45:38 GMT, Tobias Hartmann wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix order of non_data_bits > > Looks good to me. Thank you Tobias for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/2543 From rcastanedalo at openjdk.java.net Tue Feb 16 12:45:40 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 16 Feb 2021 12:45:40 GMT Subject: RFR: 8261336: IGV: enhance default filters [v2] In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 10:33:10 GMT, Tobias Hartmann wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with five additional commits since the last revision: >> >> - Rewrite 'Show control flow only' filter using categories >> - Add leading underscore field >> - Move assertion to a default switch case >> - Indent switch statements >> - Use a scoped enum for type categories (as per the HotSpot style guide) > > This is awesome. Thanks a lot for taking the time to improve the IGV. Looks good to me! Thanks Christian and Tobias for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/2499 From rcastanedalo at openjdk.java.net Tue Feb 16 12:50:51 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 16 Feb 2021 12:50:51 GMT Subject: RFR: 8261336: IGV: enhance default filters [v2] In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 12:42:45 GMT, Roberto Casta?eda Lozano wrote: >> This is awesome. Thanks a lot for taking the time to improve the IGV. Looks good to me! > > Thanks Christian and Tobias for reviewing! Add filters to color and hide parts of the graph based on node categories or estimated execution frequency, and simplify remaining filters. ------------- PR: https://git.openjdk.java.net/jdk/pull/2499 From rcastanedalo at openjdk.java.net Tue Feb 16 12:50:52 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 16 Feb 2021 12:50:52 GMT Subject: Integrated: 8261336: IGV: enhance default filters In-Reply-To: References: Message-ID: <8dk6qu4lZzEthqpitvtkSzLwGFuf0SH80kMSK1kPP5I=.b2a802d1-1a9c-4b05-ba6e-e00bb18d936c@github.com> On Wed, 10 Feb 2021 10:00:00 GMT, Roberto Casta?eda Lozano wrote: > Redesign the filters shown by default in the "Filters" window: > > - Add filters to color the graph by node category and execution frequency (if applicable), and to hide subgraphs or edges only by _category_. The category of a node can be one of {`data`, `memory`, `control`, `mixed`, `other`}, and is solely determined by its type. `mixed` nodes are those with a tuple type that has different categories, such as `CallStaticJavaNode`. The category of an edge is that of its source node. > > - Instrument C2 to include the category and estimated execution frequency (if available) of each node in the graph dumps produced by `-XX:PrintIdealGraphLevel=N` (only in debug builds). > > - Remove filters which depend on properties never emitted by C2 (e.g. 'Remove State') or which appear to be unused ('C2 Matcher Flags Coloring' and 'C2 Register Coloring'). Also remove the subsumed 'C2 Basic Coloring' filter. > > - Merge 'C2 Remove Filter' and 'C2 Structural' into a single filter with a clearer name ('Simplify graph'). > > ### Screenshots > > "Filters" window before (left) and after (right) the proposed change: > ![filters-window](https://user-images.githubusercontent.com/8792647/107749859-7a664780-6d1b-11eb-84ba-fd43e13abd0e.png) > Default color scheme before (left) and after (right) the proposed change: > ![color-scheme](https://user-images.githubusercontent.com/8792647/107517355-f3479100-6bad-11eb-9b0b-a71c18961dd8.png) > Examples of the new 'Color by execution frequency' filter: > ![color-by-frequency-2](https://user-images.githubusercontent.com/8792647/107518492-5980e380-6baf-11eb-9e01-992b211d06e3.png) > ![color-by-frequency-1](https://user-images.githubusercontent.com/8792647/107518477-54bc2f80-6baf-11eb-8c7b-7eb7c1d85cf7.png) > Example of the new 'Hide X subgraph'' filters, where only the data subgraph is shown: > ![hide-all-but-data-subgraphs](https://user-images.githubusercontent.com/8792647/107750398-42133900-6d1c-11eb-8b27-264086b32bea.png) > Example of the new 'Hide X edges' filters, where all nodes remain in their position but only the memory edges are shown: > ![hide-all-but-memory-edges](https://user-images.githubusercontent.com/8792647/107751137-49871200-6d1d-11eb-9df0-79747bf8d16e.png) > > > Tested IGV manually on a few graphs. Tested C2 instrumentation by running `hs-tier1` with `-Xbatch -XX:PrintIdealGraphLevel=3 -XX:PrintIdealGraphFile=graph.xml` on windows-x64, linux-x64, linux-aarch64, and macosx-x64 (all debug). This pull request has now been integrated. Changeset: 16bd7d38 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.java.net/jdk/commit/16bd7d38 Stats: 375 lines in 25 files changed: 292 ins; 44 del; 39 mod 8261336: IGV: enhance default filters Add filters to color and hide parts of the graph based on node categories or estimated execution frequency, and simplify remaining filters. Co-authored-by: Christian Hagedorn Reviewed-by: vlivanov, chagedorn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2499 From github.com+670087+jrziviani at openjdk.java.net Tue Feb 16 12:57:41 2021 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Tue, 16 Feb 2021 12:57:41 GMT Subject: RFR: 8261522: [PPC64] AES intrinsics write beyond the destination array [v2] In-Reply-To: References: Message-ID: <9RiCsBAS3URyQbPTCdAyZAIb-2gdFR4ttzyNhNC0Mpc=.4da8f3b2-d685-4d41-a921-5d0318467771@github.com> On Mon, 15 Feb 2021 19:47:54 GMT, Martin Doerr wrote: >> I'd like to replace the read-modify-write implementation from aescrypt_encryptBlock / aescrypt_decryptBlock stubs. It can cause severe problems (see bug description). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add comment and key length assertions. Minor improvement. +1 ------------- Marked as reviewed by jrziviani at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/2514 From lucy at openjdk.java.net Tue Feb 16 13:58:40 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Tue, 16 Feb 2021 13:58:40 GMT Subject: RFR: 8261522: [PPC64] AES intrinsics write beyond the destination array [v2] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 19:47:54 GMT, Martin Doerr wrote: >> I'd like to replace the read-modify-write implementation from aescrypt_encryptBlock / aescrypt_decryptBlock stubs. It can cause severe problems (see bug description). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add comment and key length assertions. Minor improvement. Hi Martin, the changes look good to me. Thanks for tracking that down and fixing it. I know that wasn't trivial. One minor thing - it made me believe for a second I had missed something: in line 2631, the reference should be to vRet, not vsRet. Same is true further down in decryptBlock. Lutz ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2514 From mdoerr at openjdk.java.net Tue Feb 16 15:20:58 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 16 Feb 2021 15:20:58 GMT Subject: RFR: 8261522: [PPC64] AES intrinsics write beyond the destination array [v3] In-Reply-To: References: Message-ID: > I'd like to replace the read-modify-write implementation from aescrypt_encryptBlock / aescrypt_decryptBlock stubs. It can cause severe problems (see bug description). Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Fix typo in comment. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2514/files - new: https://git.openjdk.java.net/jdk/pull/2514/files/725dd8c7..69188df5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2514&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2514&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2514.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2514/head:pull/2514 PR: https://git.openjdk.java.net/jdk/pull/2514 From mdoerr at openjdk.java.net Tue Feb 16 15:20:58 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 16 Feb 2021 15:20:58 GMT Subject: RFR: 8261522: [PPC64] AES intrinsics write beyond the destination array [v2] In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 13:56:19 GMT, Lutz Schmidt wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment and key length assertions. Minor improvement. > > Hi Martin, > the changes look good to me. Thanks for tracking that down and fixing it. I know that wasn't trivial. > > One minor thing - it made me believe for a second I had missed something: in line 2631, the reference should be to vRet, not vsRet. Same is true further down in decryptBlock. > > Lutz Thanks a lot for the reviews! I've fixed the typo in the comments as suggested by Lutz. ------------- PR: https://git.openjdk.java.net/jdk/pull/2514 From mdoerr at openjdk.java.net Tue Feb 16 15:24:43 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 16 Feb 2021 15:24:43 GMT Subject: RFR: 8261657: [PPC64] Cleanup StoreCM nodes after CMS removal In-Reply-To: <9llRP8rEfThe6mIPc57HbANvm7iyeyB2zJ73FNbWbNo=.1b07a60c-7722-4db4-bae8-4629ee3cba07@github.com> References: <9llRP8rEfThe6mIPc57HbANvm7iyeyB2zJ73FNbWbNo=.1b07a60c-7722-4db4-bae8-4629ee3cba07@github.com> Message-ID: <49i6dtfgYn-S4lkLqm3zUihbjSjmScLNVu1I1klciPY=.b00d70f7-3264-4b6c-894d-fb5a11ad3be6@github.com> On Mon, 15 Feb 2021 17:45:02 GMT, Lutz Schmidt wrote: >> We only need one StoreCM node after CMS removal. CMS StoreStore barriers were already removed at other places. > > Changes look good to me. > Thanks for cleaning up! Thanks for the review! I guess this can be considered as trivial because we basically remove dead code. ------------- PR: https://git.openjdk.java.net/jdk/pull/2554 From github.com+168222+mgkwill at openjdk.java.net Tue Feb 16 16:19:01 2021 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Tue, 16 Feb 2021 16:19:01 GMT Subject: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values Message-ID: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> 8261671: X86 I2L conversion can be skipped for certain masked positive values For the following expression: (long)(value & mask) Where value is of int type and mask is constant (power of two ? 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq Before: Benchmark Mode Cnt Score Error Units SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ? 1.496 ns/op After: Benchmark Mode Cnt Score Error Units SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ? 0.067 ns/op Signed-off-by: Marcus G K Williams ------------- Commit messages: - 8261671: Skip unnecessary Int2L conversions Changes: https://git.openjdk.java.net/jdk/pull/2590/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261671 Stats: 37 lines in 3 files changed: 37 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2590.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2590/head:pull/2590 PR: https://git.openjdk.java.net/jdk/pull/2590 From redestad at openjdk.java.net Tue Feb 16 16:36:40 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 16 Feb 2021 16:36:40 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v3] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 18:51:58 GMT, Jatin Bhateja wrote: >>> > Hi Claes, This could be a run to run variation, in general we are now having fewer number of instructions (one shift operation saved per mask computation) compared to previous masked generation sequence and thus it will always offer better execution latencies. >>> >>> Run-to-run variation would be easy to rule out by running more forks and more iterations to attain statistically significant results. While the instruction manuals suggest latency should be better for this instruction on all CPUs where it's supported, it would be good if there was some clear proof - such as a significant benchmark win - to motivate the added complexity. >> >> BASELINE: >> Result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong": >> 61.037 ns/op >> >> Secondary result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong:??perf": >> Perf stats: >> -------------------------------------------------- >> >> 19,739.21 msec task-clock # 0.389 CPUs utilized >> 646 context-switches # 0.033 K/sec >> 12 cpu-migrations # 0.001 K/sec >> 150 page-faults # 0.008 K/sec >> 74,59,83,59,139 cycles # 3.779 GHz (30.73%) >> 1,78,78,79,19,117 instructions # 2.40 insn per cycle (38.48%) >> 24,79,81,63,651 branches # 1256.289 M/sec (38.55%) >> 32,24,89,924 branch-misses # 1.30% of all branches (38.62%) >> 52,56,88,28,472 L1-dcache-loads # 2663.167 M/sec (38.65%) >> 39,00,969 L1-dcache-load-misses # 0.01% of all L1-dcache hits (38.57%) >> 3,74,131 LLC-loads # 0.019 M/sec (30.77%) >> 22,315 LLC-load-misses # 5.96% of all LL-cache hits (30.72%) >> L1-icache-loads >> 17,49,997 L1-icache-load-misses (30.72%) >> 52,91,41,70,636 dTLB-loads # 2680.663 M/sec (30.69%) >> 3,315 dTLB-load-misses # 0.00% of all dTLB cache hits (30.67%) >> 4,674 iTLB-loads # 0.237 K/sec (30.65%) >> 33,746 iTLB-load-misses # 721.99% of all iTLB cache hits (30.63%) >> L1-dcache-prefetches >> L1-dcache-prefetch-misses >> >> 50.723759146 seconds time elapsed >> >> 51.447054000 seconds user >> 0.189949000 seconds sys >> >> >> WITH OPT: >> Result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong": >> 74.356 ns/op >> >> Secondary result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong:??perf": >> Perf stats: >> -------------------------------------------------- >> >> 19,741.09 msec task-clock # 0.389 CPUs utilized >> 641 context-switches # 0.032 K/sec >> 17 cpu-migrations # 0.001 K/sec >> 164 page-faults # 0.008 K/sec >> 74,40,40,48,513 cycles # 3.769 GHz (30.81%) >> 1,45,66,22,06,797 instructions # 1.96 insn per cycle (38.56%) >> 20,31,28,43,577 branches # 1028.963 M/sec (38.65%) >> 14,11,419 branch-misses # 0.01% of all branches (38.69%) >> 43,07,86,33,662 L1-dcache-loads # 2182.182 M/sec (38.72%) >> 37,06,744 L1-dcache-load-misses # 0.01% of all L1-dcache hits (38.56%) >> 1,34,292 LLC-loads # 0.007 M/sec (30.72%) >> 30,627 LLC-load-misses # 22.81% of all LL-cache hits (30.68%) >> L1-icache-loads >> 14,49,145 L1-icache-load-misses (30.65%) >> 43,44,86,27,516 dTLB-loads # 2200.924 M/sec (30.63%) >> 218 dTLB-load-misses # 0.00% of all dTLB cache hits (30.63%) >> 2,445 iTLB-loads # 0.124 K/sec (30.63%) >> 28,624 iTLB-load-misses # 1170.72% of all iTLB cache hits (30.63%) >> L1-dcache-prefetches >> L1-dcache-prefetch-misses >> >> 50.716083931 seconds time elapsed >> >> 51.467300000 seconds user >> 0.200390000 seconds sys >> >> >> JMH perf data for ArrayCopyUnalignedSrc.testLong with copy length of 1200 shows degradation in LID accesses, it seems the benchmask got displaced from its sweet spot. >> >> But, there is a significant reduction in instruction count and cycles are almost comparable. We are saving one shift per mask computation. >> >> OLD Sequence: >> 0x00007f7fc1030ead: movabs $0x1,%rax >> 0x00007f7fc1030eb7: shlx %r8,%rax,%rax >> 0x00007f7fc1030ebc: dec %rax >> 0x00007f7fc1030ebf: kmovq %rax,%k2 >> NEW Sequence: >> 0x00007f775d030d51: movabs $0xffffffffffffffff,%rax >> 0x00007f775d030d5b: bzhi %r8,%rax,%rax >> 0x00007f775d030d60: kmovq %rax,%k2 > > Further analysis of perf degradation revealed that with new optimized instruction pattern, code alignment got disturbed. This led to increase in LSD misses, also it reduced the UOPs cashing in DSB. > Aligning copy loops at 32 byte boundary prevents any adverse impact on UOP caching. > NOPs used for padding add up to the instruction count and thus may over shadow the code size gains due to new mask generation sequence in copy stubs. > > Baseline: > ArrayCopyAligned.testLong Length : 1200 61 ns/op (approx) > 1,93,44,43,11,622 cycles > 4,59,57,99,78,727 instructions # 2.38 insn per cycle > 1,83,68,75,68,255 idq.dsb_uops > 2,08,32,43,71,906 lsd.uops > 37,12,54,60,211 idq.mite_uops > > With Opt: > ArrayCopyAligned.testLong Length : 1200 74 ns/op (approx) > 1,93,51,25,94,766 cycles > 3,75,11,57,91,917 instructions # 1.94 insn per cycle > 48,67,58,25,566 idq.dsb_uops > 19,46,13,236 lsd.uops > 2,87,42,95,74,280 idq.mite_uops > > With Opt + main loop alignment(nop): 61 ns/op (approx) > ArrayCopyAligned.testLong Length : 1200 > 1,93,52,15,90,080 cycles > 4,60,89,14,06,528 instructions # 2.38 insn per cycle > 1,78,76,10,34,991 idq.dsb_uops > 2,09,16,15,84,313 lsd.uops > 46,25,31,92,101 idq.mite_uops > > > While computing the mask for partial in-lining of small copy calls ( currently enabled for sub-word types with copy length less than 32/64 bytes), new optimized sequence should always offer lower instruction count and latency path. > > > Baseline: > ArrayCopyAligned.testByte Length : 20 avgt 2 2.635 ns/op > 1,97,76,75,18,052 cycles > 8,96,00,37,11,803 instructions # 4.53 insn per cycle > 2,71,83,79,035 idq.dsb_uops > 7,54,82,43,63,409 lsd.uops > 3,92,55,74,395 idq.mite_uops > > ArrayCopyAligned.testByte Length : 31 avgt 2 2.635 ns/op > 1,97,79,16,56,787 cycles > 8,96,13,15,69,780 instructions # 4.53 insn per cycle > 2,69,07,11,691 idq.dsb_uops > 7,54,95,63,77,683 lsd.uops > 3,90,19,10,747 idq.mite_uops > > WithOpt: > ArrayCopyAligned.testByte Length : 20 avgt 2 2.635 ns/op > 1,97,66,64,62,541 cycles > 8,92,03,95,00,236 instructions # 4.51 insn per cycle > 2,72,38,56,205 idq.dsb_uops > 7,50,87,50,60,591 lsd.uops > 3,89,15,02,954 idq.mite_uops > > ArrayCopyAligned.testByte Length : 31 avgt 2 2.635 ns/op > 1,97,54,21,61,110 cycles > 8,91,46,64,23,754 instructions # 4.51 insn per cycle > 2,78,12,19,544 idq.dsb_uops > 7,50,35,88,95,843 lsd.uops > 3,90,41,97,276 idq.mite_uops > > > Following are the links to updated JMH perf data: > http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_BASELINE.txt > http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_WITH_OPTS_LOOP_ALIGN.txt > > In general gains are not significant in case of copy stubs, but new sequence offers a optimal latency path for mask computation sequence. Thanks for getting to the bottom of that regression. ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From fweimer at openjdk.java.net Tue Feb 16 17:10:44 2021 From: fweimer at openjdk.java.net (Florian Weimer) Date: Tue, 16 Feb 2021 17:10:44 GMT Subject: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values In-Reply-To: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> References: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> Message-ID: On Tue, 16 Feb 2021 16:13:48 GMT, Marcus G K Williams wrote: > 8261671: X86 I2L conversion can be skipped for certain masked positive values > > For the following expression: > (long)(value & mask) > Where value is of int type and mask is constant (power of two ? 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq > > Before: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ? 1.496 ns/op > > > After: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ? 0.067 ns/op > > Signed-off-by: Marcus G K Williams src/hotspot/cpu/x86/x86_64.ad line 9172: > 9170: instruct convI2LAndI_reg_immIbitmask(rRegL dst, rRegI src, immI_bitmask mask, rRegI tmp, rFlagsReg cr) > 9171: %{ > 9172: predicate(VM_Version::supports_bmi2()); Agner's optimization guide says that BZHI uses microcode on Zen 2 and earlier, so perhaps the predicate should reflect that? ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 From github.com+168222+mgkwill at openjdk.java.net Tue Feb 16 17:43:08 2021 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Tue, 16 Feb 2021 17:43:08 GMT Subject: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v2] In-Reply-To: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> References: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> Message-ID: <2ATJREylgmEnepqmA5TyGCwHG926B_0bkZoNMkNuKqI=.d596fb17-06a0-4761-a4bb-c32dae085ae2@github.com> > 8261671: X86 I2L conversion can be skipped for certain masked positive values > > For the following expression: > (long)(value & mask) > Where value is of int type and mask is constant (power of two ? 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq > > Before: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ? 1.496 ns/op > > > After: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ? 0.067 ns/op > > Signed-off-by: Marcus G K Williams Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision: Update convI2LAndI_reg_immIbitmask w/ is_Intel Signed-off-by: Marcus G K Williams ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2590/files - new: https://git.openjdk.java.net/jdk/pull/2590/files/28cbd6cb..5dc976e2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2590.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2590/head:pull/2590 PR: https://git.openjdk.java.net/jdk/pull/2590 From github.com+168222+mgkwill at openjdk.java.net Tue Feb 16 17:43:09 2021 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Tue, 16 Feb 2021 17:43:09 GMT Subject: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v2] In-Reply-To: References: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> Message-ID: <9XBe7BRG8tFJ3ZHqmLa-MjmZWrJS0nIGKJzsjViGVoI=.007b2404-c93f-408e-a28e-c5491eeceb4e@github.com> On Tue, 16 Feb 2021 17:08:13 GMT, Florian Weimer wrote: >> Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision: >> >> Update convI2LAndI_reg_immIbitmask w/ is_Intel >> >> Signed-off-by: Marcus G K Williams > > src/hotspot/cpu/x86/x86_64.ad line 9172: > >> 9170: instruct convI2LAndI_reg_immIbitmask(rRegL dst, rRegI src, immI_bitmask mask, rRegI tmp, rFlagsReg cr) >> 9171: %{ >> 9172: predicate(VM_Version::supports_bmi2()); > > Agner's optimization guide says that BZHI uses microcode on Zen 2 and earlier, so perhaps the predicate should reflect that? Added `predicate(VM_Version::supports_bmi2() && VM_Version::is_intel());` ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 From github.com+2249648+johntortugo at openjdk.java.net Tue Feb 16 18:02:58 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Tue, 16 Feb 2021 18:02:58 GMT Subject: RFR: 8241502: Migrate x86_64.ad to MacroAssembler [v4] In-Reply-To: References: Message-ID: > Relates to: https://bugs.openjdk.java.net/browse/JDK-8241502 > Tested on: Linux tier1, 2 and 3 > > Can you please take a look whether these changes are going in the direction expected or not? If it is, I'll continue working on the `JDK-8241502` but I'd like to split it in a few PRs since it's a lot of changes. John Tortugo has updated the pull request incrementally with one additional commit since the last revision: Some div and shift instructions. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2420/files - new: https://git.openjdk.java.net/jdk/pull/2420/files/1e8361cc..0e72dfe0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2420&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2420&range=02-03 Stats: 419 lines in 3 files changed: 221 ins; 136 del; 62 mod Patch: https://git.openjdk.java.net/jdk/pull/2420.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2420/head:pull/2420 PR: https://git.openjdk.java.net/jdk/pull/2420 From headius at headius.com Tue Feb 16 18:48:11 2021 From: headius at headius.com (Charles Oliver Nutter) Date: Tue, 16 Feb 2021 12:48:11 -0600 Subject: Intermittent JRuby json issue related to tiered or G1 Message-ID: Hello again folks, we seem to have stumbled on an issue with either tiered compilation or G1 or both. https://github.com/jruby/jruby/issues/6554 We have been trying to track down a sporadic parse error in JRuby's json library, and it now seems likely that it is an issue at the JVM level. The issue above describes the problem that at least two of our users have been seeing. The issue manifests as JRuby's json library suddenly failing to parse code that worked earlier in the run. The main user contributing details here, "Freaky", has been able to confirm that either turning off tiered compilation or switching away from G1 (to Parallel or Shenandoah) makes the problem go away, and has provided a small reproduction: https://github.com/Freaky/jruby-issue-6554 Freaky has also been trying to correlate JIT and GC logs to the time of failure and has posted his findings on the JRuby issue above. Freaky has been running the latest Java 15 to reproduce, but other users are seeing the same issue on 11. I wanted to get some feedback here before opening an issue, mostly because it seems to require both tiered and G1 to trigger. Anyone have some cycles to help us investigate? - Charlie From shade at redhat.com Tue Feb 16 19:10:52 2021 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Feb 2021 20:10:52 +0100 Subject: Intermittent JRuby json issue related to tiered or G1 In-Reply-To: References: Message-ID: <1bac4d6a-e53d-075d-0209-6b737b7a2e77@redhat.com> On 2/16/21 7:48 PM, Charles Oliver Nutter wrote: > I wanted to get some feedback here before opening an issue, mostly > because it seems to require both tiered and G1 to trigger. Anyone have > some cycles to help us investigate? I am struggling to reproduce this, reproducer requires fiddling with the actual JRuby installation path, installing Bundler, installing gems, etc. It would help if you can provide the reproducer that one can able to run from a blank machine. As usual, running with fastdebug builds is the first step here: maybe VM would assert meaningfully, and then you can search JIRA for the assert message. My own fastdebug builds are here: https://builds.shipilev.net/ -- Thanks, -Aleksey From shade at redhat.com Tue Feb 16 19:50:46 2021 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Feb 2021 20:50:46 +0100 Subject: Intermittent JRuby json issue related to tiered or G1 In-Reply-To: <1bac4d6a-e53d-075d-0209-6b737b7a2e77@redhat.com> References: <1bac4d6a-e53d-075d-0209-6b737b7a2e77@redhat.com> Message-ID: On 2/16/21 8:10 PM, Aleksey Shipilev wrote: > As usual, running with fastdebug builds is the first step here: maybe VM would assert meaningfully, > and then you can search JIRA for the assert message. My own fastdebug builds are here: > https://builds.shipilev.net/ Look: $ export JAVA_HOME=./jdk15-fastdebug/; export PATH=$JAVA_HOME/bin:$PATH $ while true; do ~/Install/jruby-9.2.14.0/bin/jruby -J-Xmx512m ./test.rb; done Warming up -------------------------------------- JSON gem# To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/ifnode.cpp:952 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/buildbot/worker/build-jdk15u-linux/build/src/hotspot/share/opto/ifnode.cpp:952), pid=3795216, tid=3795432 # assert(this_bool->_test.is_less() && !fail->_con) failed: incorrect test # # JRE version: OpenJDK Runtime Environment (15.0.2) (fastdebug build 15.0.2-testing+0-builds.shipilev.net-openjdk-jdk15-b50-20210206) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 15.0.2-testing+0-builds.shipilev.net-openjdk-jdk15-b50-20210206, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xbf59ce] IfNode::fold_compares_helper(ProjNode*, ProjNode*, ProjNode*, PhaseIterGVN*)+0x66e # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/shade/temp/jruby/jruby-issue-6554/core.3795216) # # An error report file with more information is saved as: # /home/shade/temp/jruby/jruby-issue-6554/hs_err_pid3795216.log # # Compiler replay data is saved as: # /home/shade/temp/jruby/jruby-issue-6554/replay_pid3795216.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp ------ This reliably reproduces with latest JDK 15 binaries from here: https://builds.shipilev.net/openjdk-jdk15/ There is a single hit in JIRA, for the bug that was fixed in JDK 9. But! Neither JDK 16 nor JDK 17 fastdebug assert, which gives us an opportunity to reverse-bisect which change had fixed it. -- Thanks, -Aleksey From headius at headius.com Tue Feb 16 20:00:51 2021 From: headius at headius.com (Charles Oliver Nutter) Date: Tue, 16 Feb 2021 14:00:51 -0600 Subject: Intermittent JRuby json issue related to tiered or G1 In-Reply-To: References: <1bac4d6a-e53d-075d-0209-6b737b7a2e77@redhat.com> Message-ID: On Tue, Feb 16, 2021 at 1:50 PM Aleksey Shipilev wrote: > > On 2/16/21 8:10 PM, Aleksey Shipilev wrote: > > As usual, running with fastdebug builds is the first step here: maybe VM would assert meaningfully, > > and then you can search JIRA for the assert message. My own fastdebug builds are here: > > https://builds.shipilev.net/ > > Look: > > $ export JAVA_HOME=./jdk15-fastdebug/; export PATH=$JAVA_HOME/bin:$PATH > $ while true; do ~/Install/jruby-9.2.14.0/bin/jruby -J-Xmx512m ./test.rb; done > Warming up -------------------------------------- > JSON gem# To suppress the following error report, specify this argument > # after -XX: or in .hotspotrc: SuppressErrorAt=/ifnode.cpp:952 > # > # A fatal error has been detected by the Java Runtime Environment: Thanks for digging into this Aleksey... I will try to remember to throw a fastdebug build at these issues in the future. This one has largely been driven by our users since up until last week we had no way to reproduce. Those users are also monitoring this thread. I am a bit confused about your JDK9 reference. If it was fixed in 9 why does it reliably reproduce in 15? Perhaps I am misunderstanding the lineage of the fix you are referring to. - Charlie From shade at redhat.com Tue Feb 16 20:05:07 2021 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Feb 2021 21:05:07 +0100 Subject: Intermittent JRuby json issue related to tiered or G1 In-Reply-To: References: <1bac4d6a-e53d-075d-0209-6b737b7a2e77@redhat.com> Message-ID: On 2/16/21 9:00 PM, Charles Oliver Nutter wrote: > I am a bit confused about your JDK9 reference. If it was fixed in 9 > why does it reliably reproduce in 15? Perhaps I am misunderstanding > the lineage of the fix you are referring to. I am saying that there are no direct JIRA hits that could explain why this is happening. The only hit I got is for fix already in JDK 9, so it should not happen again. I am (slowly) bisecting between JDK 15 and JDK 16 to see which fix directly or accidentally fixed it. Then we would know what we are dealing with. -- Thanks, -Aleksey From xliu at openjdk.java.net Tue Feb 16 20:59:41 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 16 Feb 2021 20:59:41 GMT Subject: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 08:14:22 GMT, Richard Reingruber wrote: > Hi, > > this is a smart optimization. > > > There are 3 nodes involving in the construction of a java.lang.String object. > > ``` > > 1. Allocate of itself, aka. alloc > > > > 2. AllocateArray of a byte array, which is value:byte[], aka. aa > > > > 3. ArrayCopyNode which copys in the contents of value, aka. ac > > ``` > > > > > > Lemma > > When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`. > > Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well. > > Are you saying that the source of `ac` cannot be accessed by another thread because of the cloning in the constructor? But the resulting string instance which is used to construct the non-escape instance can be GlobalEscape and then the source of `ac` is accessible to other threads, isn't it? Hi, @reinrich You might not know, but I learn how to reallocate an object in deoptimization from your previous patches. Thank you! You are right. The source of ac (`src`) might be escaped. I didn't say other threads can't access it. I said we need to guarantee the `src` is stable, or it's a "frozen" array in JDK-8261007. To be honest, I didn't check the src is frozen. In practice, I only see ArrayCopy in construction [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringLatin1.java#L769).This method is creating a new substring from an established array, so its value is `stable`. After I read your comment, I went through String.java. I do find one open-ended constructor. yes, it's a problem! https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L395 If the frozen attribute is not present, what I can come up with. an array is "stable". What's do you think? 1) has annotation "stable" or 2) it's non-escaped and 3) can't find any store nodes along its mem stream. ------------- PR: https://git.openjdk.java.net/jdk/pull/2570 From neliasso at openjdk.java.net Tue Feb 16 21:44:46 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 16 Feb 2021 21:44:46 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: <7AXx69j_wscf8ENt98_apHTd1OKbeO80nNVpU68Z794=.7d776104-0f08-4e92-b229-bc210123fc4e@github.com> References: <7AXx69j_wscf8ENt98_apHTd1OKbeO80nNVpU68Z794=.7d776104-0f08-4e92-b229-bc210123fc4e@github.com> Message-ID: <8ShwNGfn1vuWGuS_kjH2zXFGyW3pObvxBZt3KivpwYU=.55fb12d3-10bf-41a6-88a4-2fa665c892ea@github.com> On Thu, 11 Feb 2021 12:25:53 GMT, Jatin Bhateja wrote: >> BMI2 BHZI instruction can be used to optimize the instruction sequence >> used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8261553: Adding BMI2 missing check for partial in-lining. Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2522 From kvn at openjdk.java.net Tue Feb 16 22:10:52 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 16 Feb 2021 22:10:52 GMT Subject: RFR: 8261235: C1 compilation fails with assert(res->vreg_number() == index) failed: conversion check [v2] In-Reply-To: References: <-uBsLA86fiUezxtSxF_GW9oT62EezgztzXPwSta8EAw=.cb63031d-3ab4-468c-9351-1f1e10c377a9@github.com> Message-ID: On Tue, 16 Feb 2021 11:36:14 GMT, Christian Hagedorn wrote: >> The assertion is hit because we run out of virtual registers in the linear scan in C1 and do not handle it. I fixed it by applying the same bailout as in `LIRGenerator::new_register()`. >> >> There is also a second issue that `LIR_OprDesc::vreg_max` is too big. It is only used in this bailout code. `OprBits::vreg_max` is defined over `OprBits::data_bits` which uses `OprBits::non_data_bits`. But `OprBits::non_data_bits` does not consider `OprBits::pointer_bits` which results in a too large value for `LIR_OprDesc::vreg_max` and the assertion is hit because we don't bail out, yet. This needs to be fixed as well. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix order of non_data_bits Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2543 From vladimir.kozlov at oracle.com Tue Feb 16 22:34:55 2021 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 16 Feb 2021 14:34:55 -0800 Subject: [External] : Re: SuperWord loop optimization lost after method inlining In-Reply-To: References: <87dcb670-622f-6b05-7c06-289f9b4f3634@oracle.com> Message-ID: <08ce2121-bbb4-3ea5-8f05-efeb33df7b74@oracle.com> Hi Nicolas, The file you shared has only assembler code. Yes, it shows that when ArrayFloatToArrayFloatVectorBinding::plus() is inlined into AVector::plus() it is not vectorized. But I asked for an other file (hotspot_pid.log) which is generated when you run app with -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation flags. It should start with: Java HotSpot(TM) 64-Bit Server VM 11.0.9+7-LTS Thanks, Vladimir K On 2/15/21 5:19 AM, Nicolas Heutte wrote: > Hi Vladimir, > > I've tried disabling tiered compilation, as you requested. It seems that the inlining was performed slightly > differently, but the issue remains. As you can see in this excerpt, the main loop isn't properly vectorized: > > ? 0x00000254b0d4bf54: cmp ? ?%r11d,%r8d > ? 0x00000254b0d4bf57: jae ? ?0x00000254b0d4c19e > ? 0x00000254b0d4bf5d: vmovss 0x10(%rcx,%r8,4),%xmm9 ?;*faload {reexecute=0 rethrow=0 return_oop=0} > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 (line 41) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 (line 118) > > ? 0x00000254b0d4bf64: cmp ? ?%ebx,%r8d > ? 0x00000254b0d4bf67: jae ? ?0x00000254b0d4c1ec > ? 0x00000254b0d4bf6d: vaddss 0x10(%rdi,%r8,4),%xmm9,%xmm9 > ? 0x00000254b0d4bf74: vmovss %xmm9,0x10(%rcx,%r8,4) ?;*fastore {reexecute=0 rethrow=0 return_oop=0} > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 (line 41) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 (line 118) > > ? 0x00000254b0d4bf7b: inc ? ?%r8d ? ? ? ? ? ? ? ;*iinc {reexecute=0 rethrow=0 return_oop=0} > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 (line 40) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 (line 118) > > ? 0x00000254b0d4bf7e: cmp ? ?%r9d,%r8d > ? 0x00000254b0d4bf81: jl ? ? 0x00000254b0d4bf54 ?;*goto {reexecute=0 rethrow=0 return_oop=0} > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 (line 40) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 (line 118) > > > > Here is the link to the full log, should you want to take a look at it: > https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing > > > Best regards, > Nicolas Heutte > > On Thu, Feb 11, 2021 at 7:05 PM Vladimir Kozlov > wrote: > > Changing wide mailing list to JIT compiler only. > > This deoptimization is normal in Tiered Compilation - it switched from profiling code (level='3') generated by C1 > compiler to new code generated by C2 (level='4') which does loop optimizations. > > Thank you for posting inlining information: > > ? ? ?@ 17? ?com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 bytes) inline (hot) > ? ? ? ? \-> TypeProfile (14054/14054 counts) = com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > > I thought before that may be call site is not hot but it is not the case. > > You can do an other experiment to collect log with disabled Tiered Compilation (only C2 is used): -XX:-TieredCompilation > Also print assembler code (as you did before) for final compilation to see if loop is still not vectorized. > > Is it possible to post log file (on GitHub?) for me to look? > > Thanks, > Vladimir K > > On 2/11/21 6:28 AM, Nicolas Heutte wrote: > > Hi?Vladimir, > > > > Thank you for your help. > > > > I'm currently running Java 11.0.9, and I did not use any VM flag of note. > > > > I checked the content of the compilation log, and it seems that ArrayFloatToArrayFloatVectorBinding::plus() was > > deoptimized in order to allow AVector::plus() to be compiled: > > > > > > > count='916' iicount='916' level='3' stamp='7394.056' comment='tiered' hot_count='896'/> > > > > > > > > > > > > The last compilation entry for AVector::plus() is: > > > > > > address='0x00000296d6af3110' > > relocation_offset='376' insts_offset='432' stub_offset='1040' scopes_data_offset='1152' scopes_pcs_offset='1592' > > dependencies_offset='1880' nul_chk_table_offset='1896' oops_offset='1064' metadata_offset='1072' > > method='com.qfs.vector.impl.AVector plus (Lcom/qfs/vector/IVector;)V' bytes='23' count='172425' iicount='172425' > > stamp='7394.199'/> > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 1 ? com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) ? inline > (hot) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14552/14552 counts) = com/qfs/vector/array/impl/ArrayFloatVector > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 7 ? com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) ? inline > (hot) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14150/14150 counts) = com/qfs/vector/array/impl/ArrayFloatVector > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 10 ? com.qfs.vector.binding.impl.VectorBindings::getBinding (9 bytes) ? inline (hot) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 5 > com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::getBinding (22 > > bytes) ? inline (hot) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 3 > com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::hasBinding > > (34 bytes) ? inline (hot) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 17 ? com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 > bytes) > > inline (hot) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14054/14054 counts) = > > com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 12 ? com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) ? inline (hot) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 22 ? com.qfs.vector.impl.AVector::checkIndex (37 bytes) ? inline (hot) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 6 ? com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) ? inline (hot) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 27 ? com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) > accessor > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 34 ? com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) > accessor > > > > > > Unfortunately, I do not have access to a debug VM build, so I cannot run the second test you recommend. > > > > Best regards, > > Nicolas Heutte > > > > On Wed, Feb 10, 2021 at 7:36 PM Vladimir Kozlov > >> wrote: > > > >? ? ?Hi, Nicolas > > > >? ? ?Looks like, when inlined, the loop from ArrayFloatToArrayFloatVectorBinding::plus() was not optimized at all: > it is not > >? ? ?unrolled and has range checks. Such loops are not vectorized (you need unrolling and no checks). > > > >? ? ?What Java version you are running? What HotSpot VM flags you are using when running application? > > > >? ? ?Run application with -XX:+LogCompilation and look on compilation data in hotspot_pid.log file for caller > >? ? ?AVector::plus(). > > > >? ? ?VM also has several flags to trace loop optimizations but they are only available in debug VM build. If you > have access > >? ? ?to such build run with -XX:+PrintCompilation -XX:+TraceLoopOpts flags. > > > >? ? ?Thanks, > >? ? ?Vladimir K > > > >? ? ?On 2/10/21 9:24 AM, Nicolas Heutte wrote: > >? ? ? > Hi all, > >? ? ? > > >? ? ? > I am encountering a performance issue caused by the interaction between > >? ? ? > method inlining and automatic vectorization. > >? ? ? > > >? ? ? > Our application aggregates arrays intensively using a method named > >? ? ? > ArrayFloatToArrayFloatVectorBinding.plus() with the following code: > >? ? ? > > >? ? ? >? ? ? for (int i = 0; i < srcLen; ++i) { > >? ? ? > > >? ? ? >? ? ? ? ? ? ? dstArray[i] += srcArray[i]; > >? ? ? > > >? ? ? >? ? ? } > >? ? ? > > >? ? ? > When we microbenchmark this method we observe fast performance close to the > >? ? ? > practical memory bandwidth and when we print the assembly code we observe > >? ? ? > loop unrolling and automatic vectorization with SIMD instructions. > >? ? ? > > >? ? ? >? ? 0x000001ef4600abf0: vmovdqu 0x10(%r14,%r13,4),%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600abf7: vaddps 0x10(%rcx,%r13,4),%ymm0,%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600abfe: vmovdqu %ymm0,0x10(%r14,%r13,4) > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac05: movslq %r13d,%r11 > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac08: vmovdqu 0x30(%r14,%r11,4),%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac0f: vaddps 0x30(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac16: vmovdqu %ymm0,0x30(%r14,%r11,4) > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac1d: vmovdqu 0x50(%r14,%r11,4),%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac24: vaddps 0x50(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac2b: vmovdqu %ymm0,0x50(%r14,%r11,4) > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac32: vmovdqu 0x70(%r14,%r11,4),%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac39: vaddps 0x70(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac40: vmovdqu %ymm0,0x70(%r14,%r11,4) > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac47: vmovdqu 0x90(%r14,%r11,4),%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac51: vaddps 0x90(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac5b: vmovdqu %ymm0,0x90(%r14,%r11,4) > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac65: vmovdqu 0xb0(%r14,%r11,4),%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac6f: vaddps 0xb0(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac79: vmovdqu %ymm0,0xb0(%r14,%r11,4) > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac83: vmovdqu 0xd0(%r14,%r11,4),%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac8d: vaddps 0xd0(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600ac97: vmovdqu %ymm0,0xd0(%r14,%r11,4) > >? ? ? > > >? ? ? >? ? 0x000001ef4600aca1: vmovdqu 0xf0(%r14,%r11,4),%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600acab: vaddps 0xf0(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? > > >? ? ? >? ? 0x000001ef4600acb5: vmovdqu %ymm0,0xf0(%r14,%r11,4)? ;*fastore > >? ? ? > {reexecute=0 rethrow=0 return_oop=0} > >? ? ? > > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > >? ? ? > (line 41) > >? ? ? > > >? ? ? >? ? 0x000001ef4600acbf: add? ? $0x40,%r13d? ? ? ? ;*iinc {reexecute=0 > >? ? ? > rethrow=0 return_oop=0} > >? ? ? > > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > >? ? ? > (line 40) > >? ? ? > > >? ? ? >? ? 0x000001ef4600acc3: cmp? ? %eax,%r13d > >? ? ? > > >? ? ? >? ? 0x000001ef4600acc6: jl? ? ?0x000001ef4600abf0? ;*goto {reexecute=0 > >? ? ? > rethrow=0 return_oop=0} > >? ? ? > > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > >? ? ? > (line 40) > >? ? ? > > >? ? ? > > >? ? ? > > >? ? ? > In the real application, this method is actually inlined in a higher level > >? ? ? > method named AVector.plus(). Unfortunately, the inlined version of the > >? ? ? > aggregation code is not vectorized anymore: > >? ? ? > > >? ? ? > > >? ? ? > > >? ? ? >? ? 0x000001ef460180a0: cmp? ? %ebx,%r11d > >? ? ? > > >? ? ? >? ? 0x000001ef460180a3: jae? ? 0x000001ef460180e6 > >? ? ? > > >? ? ? >? ? 0x000001ef460180a5: vmovss 0x10(%r8,%r11,4),%xmm1? ;*faload {reexecute=0 > >? ? ? > rethrow=0 return_oop=0} > >? ? ? > > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 > >? ? ? > (line 41) > >? ? ? > > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? > > >? ? ? >? ? 0x000001ef460180ac: cmp? ? %ecx,%r11d > >? ? ? > > >? ? ? >? ? 0x000001ef460180af: jae? ? 0x000001ef46018104 > >? ? ? > > >? ? ? >? ? 0x000001ef460180b1: vaddss 0x10(%r9,%r11,4),%xmm1,%xmm1 > >? ? ? > > >? ? ? >? ? 0x000001ef460180b8: vmovss %xmm1,0x10(%r8,%r11,4)? ;*fastore {reexecute=0 > >? ? ? > rethrow=0 return_oop=0} > >? ? ? > > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > >? ? ? > (line 41) > >? ? ? > > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? > > >? ? ? >? ? 0x000001ef460180bf: inc? ? %r11d? ? ? ? ? ? ? ;*iinc {reexecute=0 > >? ? ? > rethrow=0 return_oop=0} > >? ? ? > > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > >? ? ? > (line 40) > >? ? ? > > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? > > >? ? ? >? ? 0x000001ef460180c2: cmp? ? %r10d,%r11d > >? ? ? > > >? ? ? >? ? 0x000001ef460180c5: jl? ? ?0x000001ef460180a0? ;*goto {reexecute=0 > >? ? ? > rethrow=0 return_oop=0} > >? ? ? > > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > >? ? ? > (line 40) > >? ? ? > > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? > > >? ? ? > > >? ? ? > > >? ? ? > This causes a significant performance drop, compared to a run where we > >? ? ? > explicitly disable the inlining and observe automatically vectorized code > >? ? ? > again ( > >? ? ? > -XX:CompileCommand=dontinline,com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding.plus > >? ? ? > ). > >? ? ? > > >? ? ? > > >? ? ? > How do you guys explain that behavior of the JIT compiler? Is this a known > >? ? ? > and tracked issue, could it be fixed in the JVM? Can we do something in the > >? ? ? > java code to prevent this from happening? > >? ? ? > > >? ? ? > > >? ? ? > Best regards, > >? ? ? > > >? ? ? > Nicolas Heutte > >? ? ? > > > > From kvn at openjdk.java.net Tue Feb 16 22:42:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 16 Feb 2021 22:42:39 GMT Subject: RFR: 8261675: ObjectValue::set_visited(bool) sets _visited false [v2] In-Reply-To: References: <3qsdFKCRBP1rIh_Q_9JUQzd-LP43WZchNrFxajKp1lY=.eb22d8dd-dd57-4108-8e64-a7bedace6c99@github.com> Message-ID: On Sat, 13 Feb 2021 21:46:56 GMT, Xin Liu wrote: >> The setter is error-prone. it unconditionally sets _visited false. >> this patch stores the argument to it. > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8261675: ObjectValue::set_visited(bool) sets _visited false > > use getter and setter of _visited. > update the year of copyright. I am fine with your current change (getter and setter). Let push it. And I will work on suggested optimization/clean in separate RFE (change will need comments to avoid confusion, as you have). ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2560 From sdlin at google.com Tue Feb 16 22:51:47 2021 From: sdlin at google.com (Spencer Lin) Date: Tue, 16 Feb 2021 14:51:47 -0800 Subject: Intermittent JRuby json issue related to tiered or G1 Message-ID: Hi Aleksey, What was the JDK9 issue you were referring to? Thanks, Spencer From xliu at openjdk.java.net Tue Feb 16 23:58:50 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 16 Feb 2021 23:58:50 GMT Subject: RFR: 8261675: ObjectValue::set_visited(bool) sets _visited false [v2] In-Reply-To: References: <3qsdFKCRBP1rIh_Q_9JUQzd-LP43WZchNrFxajKp1lY=.eb22d8dd-dd57-4108-8e64-a7bedace6c99@github.com> Message-ID: On Tue, 16 Feb 2021 22:40:01 GMT, Vladimir Kozlov wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8261675: ObjectValue::set_visited(bool) sets _visited false >> >> use getter and setter of _visited. >> update the year of copyright. > > I am fine with your current change (getter and setter). Let push it. > And I will work on suggested optimization/clean in separate RFE (change will need comments to avoid confusion, as you have). Hi, Vladimir, > I am fine with your current change (getter and setter). Let push it. > And I will work on suggested optimization/clean in separate RFE (change will need comments to avoid confusion, as you have). Sounds good. Thank you for sponsoring it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2560 From kvn at openjdk.java.net Wed Feb 17 00:27:51 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 17 Feb 2021 00:27:51 GMT Subject: RFR: 8261675: ObjectValue::set_visited(bool) sets _visited false [v2] In-Reply-To: References: <3qsdFKCRBP1rIh_Q_9JUQzd-LP43WZchNrFxajKp1lY=.eb22d8dd-dd57-4108-8e64-a7bedace6c99@github.com> Message-ID: On Tue, 16 Feb 2021 23:56:18 GMT, Xin Liu wrote: >> I am fine with your current change (getter and setter). Let push it. >> And I will work on suggested optimization/clean in separate RFE (change will need comments to avoid confusion, as you have). > > Hi, Vladimir, > > >> I am fine with your current change (getter and setter). Let push it. >> And I will work on suggested optimization/clean in separate RFE (change will need comments to avoid confusion, as you have). > > Sounds good. Thank you for sponsoring it. Note, I think it is trivial change and one review is enough. ------------- PR: https://git.openjdk.java.net/jdk/pull/2560 From xliu at openjdk.java.net Wed Feb 17 00:27:52 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 17 Feb 2021 00:27:52 GMT Subject: Integrated: 8261675: ObjectValue::set_visited(bool) sets _visited false In-Reply-To: <3qsdFKCRBP1rIh_Q_9JUQzd-LP43WZchNrFxajKp1lY=.eb22d8dd-dd57-4108-8e64-a7bedace6c99@github.com> References: <3qsdFKCRBP1rIh_Q_9JUQzd-LP43WZchNrFxajKp1lY=.eb22d8dd-dd57-4108-8e64-a7bedace6c99@github.com> Message-ID: On Sat, 13 Feb 2021 01:51:39 GMT, Xin Liu wrote: > The setter is error-prone. it unconditionally sets _visited false. > this patch stores the argument to it. This pull request has now been integrated. Changeset: 2677f6f4 Author: Xin Liu Committer: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/2677f6f4 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod 8261675: ObjectValue::set_visited(bool) sets _visited false Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/2560 From github.com+2249648+johntortugo at openjdk.java.net Wed Feb 17 04:10:58 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Wed, 17 Feb 2021 04:10:58 GMT Subject: RFR: 8241502: Migrate x86_64.ad to MacroAssembler [v5] In-Reply-To: References: Message-ID: > Relates to: https://bugs.openjdk.java.net/browse/JDK-8241502 > Tested on: Linux tier1, 2 and 3 > > Can you please take a look whether these changes are going in the direction expected or not? If it is, I'll continue working on the `JDK-8241502` but I'd like to split it in a few PRs since it's a lot of changes. John Tortugo has updated the pull request incrementally with one additional commit since the last revision: More shifts; logic operations and movs. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2420/files - new: https://git.openjdk.java.net/jdk/pull/2420/files/0e72dfe0..b776b9f6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2420&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2420&range=03-04 Stats: 117 lines in 3 files changed: 68 ins; 7 del; 42 mod Patch: https://git.openjdk.java.net/jdk/pull/2420.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2420/head:pull/2420 PR: https://git.openjdk.java.net/jdk/pull/2420 From fweimer at openjdk.java.net Wed Feb 17 05:58:39 2021 From: fweimer at openjdk.java.net (Florian Weimer) Date: Wed, 17 Feb 2021 05:58:39 GMT Subject: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v2] In-Reply-To: <9XBe7BRG8tFJ3ZHqmLa-MjmZWrJS0nIGKJzsjViGVoI=.007b2404-c93f-408e-a28e-c5491eeceb4e@github.com> References: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> <9XBe7BRG8tFJ3ZHqmLa-MjmZWrJS0nIGKJzsjViGVoI=.007b2404-c93f-408e-a28e-c5491eeceb4e@github.com> Message-ID: On Tue, 16 Feb 2021 17:39:34 GMT, Marcus G K Williams wrote: >> src/hotspot/cpu/x86/x86_64.ad line 9172: >> >>> 9170: instruct convI2LAndI_reg_immIbitmask(rRegL dst, rRegI src, immI_bitmask mask, rRegI tmp, rFlagsReg cr) >>> 9171: %{ >>> 9172: predicate(VM_Version::supports_bmi2()); >> >> Agner's optimization guide says that BZHI uses microcode on Zen 2 and earlier, so perhaps the predicate should reflect that? > > Added `predicate(VM_Version::supports_bmi2() && VM_Version::is_intel());` I'm so sorry, I confused this with `PDEP` and `PEXT`. `BZHI` is actually fine on Zen. So the original code is correct. ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 From jbhateja at openjdk.java.net Wed Feb 17 08:31:39 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 17 Feb 2021 08:31:39 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: <8ShwNGfn1vuWGuS_kjH2zXFGyW3pObvxBZt3KivpwYU=.55fb12d3-10bf-41a6-88a4-2fa665c892ea@github.com> References: <7AXx69j_wscf8ENt98_apHTd1OKbeO80nNVpU68Z794=.7d776104-0f08-4e92-b229-bc210123fc4e@github.com> <8ShwNGfn1vuWGuS_kjH2zXFGyW3pObvxBZt3KivpwYU=.55fb12d3-10bf-41a6-88a4-2fa665c892ea@github.com> Message-ID: On Tue, 16 Feb 2021 21:40:58 GMT, Nils Eliasson wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8261553: Adding BMI2 missing check for partial in-lining. > > Looks good. > Thanks for getting to the bottom of that regression. Thanks, since there is not a significant impact on performance, but having an optimum instruction sequence will still reduce the complexity. Should it be ok to check this in ? We have one reviewer consent. ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From chagedorn at openjdk.java.net Wed Feb 17 09:03:40 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 17 Feb 2021 09:03:40 GMT Subject: RFR: 8261235: C1 compilation fails with assert(res->vreg_number() == index) failed: conversion check [v2] In-Reply-To: References: <-uBsLA86fiUezxtSxF_GW9oT62EezgztzXPwSta8EAw=.cb63031d-3ab4-468c-9351-1f1e10c377a9@github.com> Message-ID: On Tue, 16 Feb 2021 22:08:16 GMT, Vladimir Kozlov wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix order of non_data_bits > > Good. Thanks Vladimir for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/2543 From chagedorn at openjdk.java.net Wed Feb 17 09:03:43 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 17 Feb 2021 09:03:43 GMT Subject: Integrated: 8261235: C1 compilation fails with assert(res->vreg_number() == index) failed: conversion check In-Reply-To: <-uBsLA86fiUezxtSxF_GW9oT62EezgztzXPwSta8EAw=.cb63031d-3ab4-468c-9351-1f1e10c377a9@github.com> References: <-uBsLA86fiUezxtSxF_GW9oT62EezgztzXPwSta8EAw=.cb63031d-3ab4-468c-9351-1f1e10c377a9@github.com> Message-ID: <_SpE5MrexsYFrtkHPSClQSROJMMbuXi-NHhZVzuVrKQ=.17cfcad9-10ad-45eb-b694-5a1374e7bae8@github.com> On Fri, 12 Feb 2021 10:03:25 GMT, Christian Hagedorn wrote: > The assertion is hit because we run out of virtual registers in the linear scan in C1 and do not handle it. I fixed it by applying the same bailout as in `LIRGenerator::new_register()`. > > There is also a second issue that `LIR_OprDesc::vreg_max` is too big. It is only used in this bailout code. `OprBits::vreg_max` is defined over `OprBits::data_bits` which uses `OprBits::non_data_bits`. But `OprBits::non_data_bits` does not consider `OprBits::pointer_bits` which results in a too large value for `LIR_OprDesc::vreg_max` and the assertion is hit because we don't bail out, yet. This needs to be fixed as well. > > Thanks, > Christian This pull request has now been integrated. Changeset: 84182855 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/84182855 Stats: 4152 lines in 6 files changed: 4137 ins; 2 del; 13 mod 8261235: C1 compilation fails with assert(res->vreg_number() == index) failed: conversion check Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/2543 From rrich at openjdk.java.net Wed Feb 17 09:03:38 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Wed, 17 Feb 2021 09:03:38 GMT Subject: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 20:56:32 GMT, Xin Liu wrote: >> Hi, >> >> this is a smart optimization. >> >>> >>> >>> There are 3 nodes involving in the construction of a java.lang.String object. >>> >>> 1. Allocate of itself, aka. alloc >>> >>> 2. AllocateArray of a byte array, which is value:byte[], aka. aa >>> >>> 3. ArrayCopyNode which copys in the contents of value, aka. ac >>> >>> >>> Lemma >>> When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`. >>> >>> Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well. >> >> Are you saying that the source of `ac` cannot be accessed by another thread because of the cloning in the constructor? But the resulting string instance which is used to construct the non-escape instance can be GlobalEscape and then the source of `ac` is accessible to other threads, isn't it? > >> Hi, >> >> this is a smart optimization. >> >> > There are 3 nodes involving in the construction of a java.lang.String object. >> > ``` >> > 1. Allocate of itself, aka. alloc >> > >> > 2. AllocateArray of a byte array, which is value:byte[], aka. aa >> > >> > 3. ArrayCopyNode which copys in the contents of value, aka. ac >> > ``` >> > >> > >> > Lemma >> > When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`. >> > Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well. >> >> Are you saying that the source of `ac` cannot be accessed by another thread because of the cloning in the constructor? But the resulting string instance which is used to construct the non-escape instance can be GlobalEscape and then the source of `ac` is accessible to other threads, isn't it? > > Hi, @reinrich > You might not know, but I learn how to reallocate an object in deoptimization from your previous patches. Thank you! > > You are right. The source of ac (`src`) might be escaped. I didn't say other threads can't access it. I said we need to guarantee the `src` is stable, or it's a "frozen" array in JDK-8261007. > > To be honest, I didn't check the src is frozen. In practice, I only see ArrayCopy in construction [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringLatin1.java#L769).This method is creating a new substring from an established array, so its value is `stable`. > > After I read your comment, I went through String.java. I do find one open-ended constructor. yes, it's a problem! > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L395 > > If the frozen attribute is not present, what I can come up with. an array is "stable". What's do you think? > 1) has annotation "stable" or > 2) it's non-escaped and > 3) can't find any store nodes along its mem stream. Hi @navyxliu, > You might not know, but I learn how to reallocate an object in deoptimization from your previous patches. Thank you! Oh, great! :) > You are right. The source of ac (`src`) might be escaped. I didn't say other threads can't access it. I said we need to guarantee the `src` is stable, or it's a "frozen" array in JDK-8261007. > > To be honest, I didn't check the src is frozen. In practice, I only see ArrayCopy in construction [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringLatin1.java#L769).This method is creating a new substring from an established array, so its value is `stable`. Sorry, I wasn't aware of the @Stable annotation on the value array. https://github.com/openjdk/jdk/blob/d19503353e5c347ce393544a3a30d5caec53d133/src/java.base/share/classes/java/lang/String.java#L154 So my concern was that another thread could for example use reflection to modify the value array but I reckon this is illegal then (I wonder if it is checked in reflection...). Also there is already UseStringDeduplication that relies on the value array being stable. > After I read your comment, I went through String.java. I do find one open-ended constructor. yes, it's a problem! > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L395 I see this one is deprecated. Why do you think there's a problem? > If the frozen attribute is not present, what I can come up with. an array is "stable". What's do you think? > > 1. has annotation "stable" or > > 2. it's non-escaped and > > 3. can't find any store nodes along its mem stream. Currently I'd think this is not necessary. `@Stable` is strong enough. The optimization reminds me a bit of earlier Java versions where String instances had an offset field and the value array could be shared among String instances. I guess the new String needs to be scalarized mostly because you cannot take care of the offset otherwise. Coincidentally I'm currently looking at [LoadNode::can_see_arraycopy_value()](https://github.com/openjdk/jdk/blob/b955f85e03bafe8ce39677d0af06bf1ceb7e2cbb/src/hotspot/share/opto/memnode.cpp#L951) which does something similar you want to do. There I don't see any concerns about the src being stable. In fact I don't think this is correct. Just a side mark... These are my (somewhat random) thoughts so far. I'd think it is a legal and useful optimization (though haven't yet looked at the code yet). Thanks, Richard. ------------- PR: https://git.openjdk.java.net/jdk/pull/2570 From dlong at openjdk.java.net Wed Feb 17 09:49:41 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 17 Feb 2021 09:49:41 GMT Subject: RFR: 8241502: Migrate x86_64.ad to MacroAssembler In-Reply-To: References: <0azhJ4pD5Tq_lkpPtYMpQBjokflcSQEdWP2Rz9HBm6k=.c3ece6fd-1ae7-49ea-a6eb-ec88a9fbd54d@github.com> Message-ID: On Thu, 11 Feb 2021 05:12:55 GMT, John Tortugo wrote: >> I wish there was a way for the old and new versions to co-exist at the same time, so we could generate the code the old way and and the new way, then compare, for automatic verification of the MacroAssember version. > > Thank you all for the feedback! > > @iklam - I'll check that and let you know once I make more conversions. > > @dean-long - That would be great. I'm all ears for the best way to test this! Here's one way to test both versions, using loadRange as an example. It's not exactly pretty, but it seems to work. [loadRange.txt](https://github.com/openjdk/jdk/files/5994725/loadRange.txt) ------------- PR: https://git.openjdk.java.net/jdk/pull/2420 From mdoerr at openjdk.java.net Wed Feb 17 10:30:52 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 17 Feb 2021 10:30:52 GMT Subject: Integrated: 8261522: [PPC64] AES intrinsics write beyond the destination array In-Reply-To: References: Message-ID: <5zlfn8FhS8rhNbEsFSjApw6M0f-MEIDWN1U8CB8mWZw=.29ec88e5-b058-42df-9137-9b68844edde2@github.com> On Wed, 10 Feb 2021 17:05:44 GMT, Martin Doerr wrote: > I'd like to replace the read-modify-write implementation from aescrypt_encryptBlock / aescrypt_decryptBlock stubs. It can cause severe problems (see bug description). This pull request has now been integrated. Changeset: 05d59556 Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/05d59556 Stats: 76 lines in 1 file changed: 32 ins; 20 del; 24 mod 8261522: [PPC64] AES intrinsics write beyond the destination array Reviewed-by: lucy ------------- PR: https://git.openjdk.java.net/jdk/pull/2514 From neliasso at openjdk.java.net Wed Feb 17 10:32:59 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 17 Feb 2021 10:32:59 GMT Subject: RFR: 8260653: Unreachable nodes keep speculative types alive Message-ID: 8260653: Unreachable nodes keep speculative types alive ------------- Commit messages: - Remove useless nodes Changes: https://git.openjdk.java.net/jdk/pull/2606/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2606&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260653 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2606.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2606/head:pull/2606 PR: https://git.openjdk.java.net/jdk/pull/2606 From redestad at openjdk.java.net Wed Feb 17 11:47:47 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 17 Feb 2021 11:47:47 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v3] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 18:55:02 GMT, Jatin Bhateja wrote: >> BMI2 BHZI instruction can be used to optimize the instruction sequence >> used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8261553 : Aligning main copy loop to prevent any penalty due to LSD and DSB misses. Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From github.com+42899633+eastig at openjdk.java.net Wed Feb 17 12:09:42 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 17 Feb 2021 12:09:42 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v4] In-Reply-To: References: Message-ID: On Sun, 14 Feb 2021 05:44:51 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > reimplement this feature. withdraw my intrusive change in outputStream. > use stringStream only for the constant OopPtr. after oop->print_on(st), > delete all appearances of '\n' > - Merge branch 'master' into JDK-8260198 > - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > fix merge conflict. > - Merge branch 'master' into JDK-8260198 > - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Changes requested by eastig at github.com (no known OpenJDK username). src/hotspot/share/opto/type.cpp line 4049: > 4047: ss.print(" "); > 4048: const_oop()->print_oop(&ss); > 4049: ss.tr_delete('\n'); `tr_delete` is expensive. Also deleting something in a stream does not fit into a concept of streams. I see that the content of `ss` is traversed many times. What about this code: for (const char *str = ss.base(); *str; ) { size_t i = 0; while (str[i] && str[i] != '\n' ) { ++i; } st->print_raw(str, i); str += i; while (*str == '\n') { ++str; } } ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From neliasso at openjdk.java.net Wed Feb 17 12:43:40 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 17 Feb 2021 12:43:40 GMT Subject: RFR: 8260653: Unreachable nodes keep speculative types alive In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 10:27:08 GMT, Nils Eliasson wrote: > The RunThese test fails because after first igvn.optimize() (directly after parsing) there are unreachable nodes cycles left. Later when remove_speculative_types() is called - only reachable nodes will have their speculative type removed. At the end of remove_speculative_types() there is an assert that all speculative types have been removed that will fail. > > This problem will not cause crashes in production - it is only a sanity test. > > I suggest adding a call to PhaseRemoveUseless before remove_speculative_types. This will cost a few extra cycles but it is the only way we can guarantee that no unreachable nodes are left. > > When debugging this I experimented with adding a call to verify_graph_edges to check for dead code at the same spot. This triggers failures in a lot of test. The conclusion is that it is very common that we have dead node cycles - but they very rarely keep speculative types alive. > > A big thank you to Dean Long how created the reproducer for this bug. > > Please review. One reasonable alternative to adding one more pass of PhaseRemoveUseless is filtering out all non-reachable nodes from NodeHash::check_no_speculative_types. The unreachable nodes only live a short while before being removed in the renumbering phase. ------------- PR: https://git.openjdk.java.net/jdk/pull/2606 From goetz at openjdk.java.net Wed Feb 17 13:10:40 2021 From: goetz at openjdk.java.net (Goetz Lindenmaier) Date: Wed, 17 Feb 2021 13:10:40 GMT Subject: RFR: 8261657: [PPC64] Cleanup StoreCM nodes after CMS removal In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 16:29:36 GMT, Martin Doerr wrote: > We only need one StoreCM node after CMS removal. CMS StoreStore barriers were already removed at other places. Marked as reviewed by goetz (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2554 From goetz at openjdk.java.net Wed Feb 17 13:10:40 2021 From: goetz at openjdk.java.net (Goetz Lindenmaier) Date: Wed, 17 Feb 2021 13:10:40 GMT Subject: RFR: 8261657: [PPC64] Cleanup StoreCM nodes after CMS removal In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 13:07:19 GMT, Goetz Lindenmaier wrote: >> We only need one StoreCM node after CMS removal. CMS StoreStore barriers were already removed at other places. > > Marked as reviewed by goetz (Reviewer). LGTM ------------- PR: https://git.openjdk.java.net/jdk/pull/2554 From mdoerr at openjdk.java.net Wed Feb 17 13:14:47 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 17 Feb 2021 13:14:47 GMT Subject: RFR: 8261657: [PPC64] Cleanup StoreCM nodes after CMS removal In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 13:07:43 GMT, Goetz Lindenmaier wrote: >> Marked as reviewed by goetz (Reviewer). > > LGTM Thanks for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/2554 From mdoerr at openjdk.java.net Wed Feb 17 13:14:48 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 17 Feb 2021 13:14:48 GMT Subject: Integrated: 8261657: [PPC64] Cleanup StoreCM nodes after CMS removal In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 16:29:36 GMT, Martin Doerr wrote: > We only need one StoreCM node after CMS removal. CMS StoreStore barriers were already removed at other places. This pull request has now been integrated. Changeset: 9ba2b71a Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/9ba2b71a Stats: 55 lines in 1 file changed: 0 ins; 52 del; 3 mod 8261657: [PPC64] Cleanup StoreCM nodes after CMS removal Reviewed-by: lucy, goetz ------------- PR: https://git.openjdk.java.net/jdk/pull/2554 From rcastanedalo at openjdk.java.net Wed Feb 17 13:31:53 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 17 Feb 2021 13:31:53 GMT Subject: RFR: 8259984: IGV: Crash when drawing control flow before GCM Message-ID: The block formation algorithm applied by IGV to schedule graphs without associated CFG information traverses the graph backwards. This makes it difficult to deal with block projection nodes, leading in some cases to double addition of block nodes and block edges, and ultimately causing assertion failures. This fix replaces the backward traversal by a forward traversal that relies on node category information (introduced in [8261336](https://bugs.openjdk.java.net/browse/JDK-8261336)) to identify control successors. The forward traversal is arguably simpler and, besides avoiding the reported assertion failure, has two advantages: - it places block projection nodes in the same block as their predecessors, and - it numbers basic blocks more naturally. The following screenshots illustrate the improvements (before the fix to the left, after the fix to the right): ![cfgs-before-after](https://user-images.githubusercontent.com/8792647/108204708-6dcf5e00-7124-11eb-956c-fb7f84229b50.png) Tested automatically on tens of thousands of graphs by running `java -Xcomp -XX:-TieredCompilation -XX:PrintIdealGraphLevel=4 ...` on an instrumented version of IGV that schedules graphs eagerly. Checked manually that the CFGs of a few selected graphs (included the reported one) are well-formed. Also checked that the overall IGV graph scheduling time is not affected by the changes. Thanks to Christian Hagedorn for checking the fix independently. ------------- Commit messages: - Form basic blocks using a forward traversal - Connect orphan/widow CFG nodes to root - Mark CFG nodes before scheduling Changes: https://git.openjdk.java.net/jdk/pull/2607/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2607&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259984 Stats: 167 lines in 1 file changed: 90 ins; 19 del; 58 mod Patch: https://git.openjdk.java.net/jdk/pull/2607.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2607/head:pull/2607 PR: https://git.openjdk.java.net/jdk/pull/2607 From github.com+10482586+therealeliu at openjdk.java.net Wed Feb 17 13:34:42 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Wed, 17 Feb 2021 13:34:42 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v3] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 03:07:38 GMT, Eric Liu wrote: >> All that remains to do is the benchmarks. > > @theRealAph Could you help to take a look ? Ping ------------- PR: https://git.openjdk.java.net/jdk/pull/1858 From jbhateja at openjdk.java.net Wed Feb 17 14:12:39 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 17 Feb 2021 14:12:39 GMT Subject: Integrated: 8261553: Efficient mask generation using BMI2 BZHI instruction In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 08:31:40 GMT, Jatin Bhateja wrote: > BMI2 BHZI instruction can be used to optimize the instruction sequence > used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. This pull request has now been integrated. Changeset: cb84539d Author: Jatin Bhateja URL: https://git.openjdk.java.net/jdk/commit/cb84539d Stats: 39 lines in 6 files changed: 12 ins; 11 del; 16 mod 8261553: Efficient mask generation using BMI2 BZHI instruction Reviewed-by: redestad, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From neliasso at openjdk.java.net Wed Feb 17 14:35:59 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 17 Feb 2021 14:35:59 GMT Subject: RFR: 8260653: Unreachable nodes keep speculative types alive [v2] In-Reply-To: References: Message-ID: > The RunThese test fails because after first igvn.optimize() (directly after parsing) there are unreachable nodes cycles left. Later when remove_speculative_types() is called - only reachable nodes will have their speculative type removed. At the end of remove_speculative_types() there is an assert that all speculative types have been removed that will fail. > > This problem will not cause crashes in production - it is only a sanity test. > > I suggest adding a call to PhaseRemoveUseless before remove_speculative_types. This will cost a few extra cycles but it is the only way we can guarantee that no unreachable nodes are left. > > When debugging this I experimented with adding a call to verify_graph_edges to check for dead code at the same spot. This triggers failures in a lot of test. The conclusion is that it is very common that we have dead node cycles - but they very rarely keep speculative types alive. > > A big thank you to Dean Long how created the reproducer for this bug. > > Please review. Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: Check if node is live before assert ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2606/files - new: https://git.openjdk.java.net/jdk/pull/2606/files/5f279179..82ab1066 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2606&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2606&range=00-01 Stats: 10 lines in 2 files changed: 6 ins; 3 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2606.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2606/head:pull/2606 PR: https://git.openjdk.java.net/jdk/pull/2606 From shade at redhat.com Wed Feb 17 14:50:44 2021 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 17 Feb 2021 15:50:44 +0100 Subject: Intermittent JRuby json issue related to tiered or G1 In-Reply-To: References: <1bac4d6a-e53d-075d-0209-6b737b7a2e77@redhat.com> Message-ID: <7110e0cf-8c77-1d6d-8854-3160d5c901f8@redhat.com> On 2/16/21 9:05 PM, Aleksey Shipilev wrote: > On 2/16/21 9:00 PM, Charles Oliver Nutter wrote: >> I am a bit confused about your JDK9 reference. If it was fixed in 9 >> why does it reliably reproduce in 15? Perhaps I am misunderstanding >> the lineage of the fix you are referring to. > > I am saying that there are no direct JIRA hits that could explain why this is happening. The only > hit I got is for fix already in JDK 9, so it should not happen again. > > I am (slowly) bisecting between JDK 15 and JDK 16 to see which fix directly or accidentally fixed > it. Then we would know what we are dealing with. This thing is really hairy. Reverse bisects shows that this one: https://bugs.openjdk.java.net/browse/JDK-8257847 ...makes failure in fastdebug much less likely. This explains why I have not seen the failures in JDK 16 and JDK 17 yesterday. I have managed to reliably crash the recent JDK by promoting the assert in question into guarantee: diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp index 29624765324..467d8f19276 100644 --- a/src/hotspot/share/opto/ifnode.cpp +++ b/src/hotspot/share/opto/ifnode.cpp @@ -948,7 +948,9 @@ bool IfNode::fold_compares_helper(ProjNode* proj, ProjNode* success, ProjNode* f assert((dom_bool->_test.is_less() && proj->_con) || (dom_bool->_test.is_greater() && !proj->_con), "incorrect test"); // this test was canonicalized - assert(this_bool->_test.is_less() && !fail->_con, "incorrect test"); + guarantee(this_bool->_test.is_less() && !fail->_con, "incorrect test: dom_bool.test=%d proj._con=%d this_bool.test=%d fail._con=%d", + dom_bool->_test._test, proj->_con, + this_bool->_test._test, fail->_con); cond = (hi_test == BoolTest::le || hi_test == BoolTest::gt) ? BoolTest::gt : BoolTest::ge; ...which then fails with: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (ifnode.cpp:955), pid=2438111, tid=2438182 # guarantee(this_bool->_test.is_less() && !fail->_con) failed: incorrect test: dom_bool.test=3 proj._con=1 this_bool.test=7 fail._con=1 # # JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-adhoc.shade.jdk) # Java VM: OpenJDK 64-Bit Server VM (17-internal+0-adhoc.shade.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x7fc3ee] IfNode::fold_compares_helper(ProjNode*, ProjNode*, ProjNode*, PhaseIterGVN*) [clone .part.0]+0x19e # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/shade/temp/jruby/jruby-issue-6554/core.2438111) # # An error report file with more information is saved as: # /home/shade/temp/jruby/jruby-issue-6554/hs_err_pid2438111.log # # Compiler replay data is saved as: # /home/shade/temp/jruby/jruby-issue-6554/replay_pid2438111.log "this_bool.test=7" means the test is "GE". The downstream code does not expect this. It expects the test to be canonicalized. This minimal thing bails out on discovery of such bad test: diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp @@ -971,6 +973,9 @@ bool IfNode::fold_compares_helper(ProjNode* proj, ProjNode* success, ProjNode* f lo = igvn->transform(new AddINode(lo, igvn->intcon(1))); cond = BoolTest::ge; } + } else { + // Safety: something is broken, break away. + return false; } } else { const TypeInt* failtype = filtered_int_type(igvn, n, proj); I think I'll submit two issues: one that codes fold_compares_helper more defensively like in the patch above (this would be backportable), and then the follow-up that targets to address the actual problem (why do we have uncanonicalized test). -- Thanks, -Aleksey From github.com+10835776+stsypanov at openjdk.java.net Wed Feb 17 14:51:52 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Wed, 17 Feb 2021 14:51:52 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible Message-ID: Non-static classes hold a link to their parent classes, which in many cases can be avoided. ------------- Commit messages: - 8261880: Change nested classes in java.base to static nested classes where possible Changes: https://git.openjdk.java.net/jdk/pull/2589/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2589&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261880 Stats: 20 lines in 16 files changed: 0 ins; 0 del; 20 mod Patch: https://git.openjdk.java.net/jdk/pull/2589.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2589/head:pull/2589 PR: https://git.openjdk.java.net/jdk/pull/2589 From shade at redhat.com Wed Feb 17 15:14:03 2021 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 17 Feb 2021 16:14:03 +0100 Subject: Intermittent JRuby json issue related to tiered or G1 In-Reply-To: <7110e0cf-8c77-1d6d-8854-3160d5c901f8@redhat.com> References: <1bac4d6a-e53d-075d-0209-6b737b7a2e77@redhat.com> <7110e0cf-8c77-1d6d-8854-3160d5c901f8@redhat.com> Message-ID: <18a23eea-03a2-7f69-c4f4-4d22bddb3eb9@redhat.com> On 2/17/21 3:50 PM, Aleksey Shipilev wrote: > I think I'll submit two issues: one that codes fold_compares_helper more defensively like in the > patch above (this would be backportable) This would be: https://bugs.openjdk.java.net/browse/JDK-8261912 > and then the follow-up that targets to address the actual > problem (why do we have uncanonicalized test). And this would be: https://bugs.openjdk.java.net/browse/JDK-8261914 -- Thanks, -Aleksey From enikitin at openjdk.java.net Wed Feb 17 15:46:58 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Wed, 17 Feb 2021 15:46:58 GMT Subject: RFR: 8261666: [mlvm] Remove WhiteBoxHelper Message-ID: <4pcln-55Xj11rpaV-DxVnxfjp5m0OAsXS2ribyvM96E=.6f45af64-eacc-4943-a31a-d5e117532b1f@github.com> The mlvm test suite used this class for situations when WhiteBox was not available during the test's build time - therefore, the test tried to find the WhiteBox on runtime. These days, the WhiteBox is always available so this is not the case. The said helper is to be removed and its only user adjusted. Testing: test/hotspot/jtreg/vmTestBase/vm/mlvm on linux/mac/windows x64. ------------- Commit messages: - 8261666: [mlvm] Remove WhiteBoxHelper Changes: https://git.openjdk.java.net/jdk/pull/2609/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2609&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261666 Stats: 85 lines in 2 files changed: 3 ins; 80 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2609.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2609/head:pull/2609 PR: https://git.openjdk.java.net/jdk/pull/2609 From nhe at activeviam.com Wed Feb 17 10:34:04 2021 From: nhe at activeviam.com (Nicolas Heutte) Date: Wed, 17 Feb 2021 11:34:04 +0100 Subject: [External] : Re: SuperWord loop optimization lost after method inlining In-Reply-To: <08ce2121-bbb4-3ea5-8f05-efeb33df7b74@oracle.com> References: <87dcb670-622f-6b05-7c06-289f9b4f3634@oracle.com> <08ce2121-bbb4-3ea5-8f05-efeb33df7b74@oracle.com> Message-ID: Hi Vladimir, I have rerun the test with the appropriate options, the obtained logs are in this folder: https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing Best regards, Nicolas Heutte On Tue, Feb 16, 2021 at 11:35 PM Vladimir Kozlov wrote: > Hi Nicolas, > > The file you shared has only assembler code. Yes, it shows that when > ArrayFloatToArrayFloatVectorBinding::plus() is > inlined into AVector::plus() it is not vectorized. > > But I asked for an other file (hotspot_pid.log) which is generated > when you run app with > -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation flags. It should start > with: > > > > > > Java HotSpot(TM) 64-Bit Server VM > > > 11.0.9+7-LTS > > > Thanks, > Vladimir K > > On 2/15/21 5:19 AM, Nicolas Heutte wrote: > > Hi Vladimir, > > > > I've tried disabling tiered compilation, as you requested. It seems that > the inlining was performed slightly > > differently, but the issue remains. As you can see in this excerpt, the > main loop isn't properly vectorized: > > > > 0x00000254b0d4bf54: cmp %r11d,%r8d > > 0x00000254b0d4bf57: jae 0x00000254b0d4c19e > > 0x00000254b0d4bf5d: vmovss 0x10(%rcx,%r8,4),%xmm9 ;*faload > {reexecute=0 rethrow=0 return_oop=0} > > ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 > (line 41) > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > ; - > com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > > ; - > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 > (line 66) > > ; - > com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 (line 118) > > > > 0x00000254b0d4bf64: cmp %ebx,%r8d > > 0x00000254b0d4bf67: jae 0x00000254b0d4c1ec > > 0x00000254b0d4bf6d: vaddss 0x10(%rdi,%r8,4),%xmm9,%xmm9 > > 0x00000254b0d4bf74: vmovss %xmm9,0x10(%rcx,%r8,4) ;*fastore > {reexecute=0 rethrow=0 return_oop=0} > > ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > (line 41) > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > ; - > com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > > ; - > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 > (line 66) > > ; - > com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 (line 118) > > > > 0x00000254b0d4bf7b: inc %r8d ;*iinc {reexecute=0 > rethrow=0 return_oop=0} > > ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > (line 40) > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > ; - > com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > > ; - > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 > (line 66) > > ; - > com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 (line 118) > > > > 0x00000254b0d4bf7e: cmp %r9d,%r8d > > 0x00000254b0d4bf81: jl 0x00000254b0d4bf54 ;*goto {reexecute=0 > rethrow=0 return_oop=0} > > ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > (line 40) > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > ; - > com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > > ; - > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 > (line 66) > > ; - > com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 (line 118) > > > > > > > > Here is the link to the full log, should you want to take a look at it: > > > https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing > > < > https://urldefense.com/v3/__https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing__;!!GqivPVa7Brio!PBuP6MfDNWUOTe23SSXA0V5wn_VHjv2sjI8POWRwp6mr0wVdIzFhNoVZANb4FqCYKwzapw$ > > > > > > Best regards, > > Nicolas Heutte > > > > On Thu, Feb 11, 2021 at 7:05 PM Vladimir Kozlov < > vladimir.kozlov at oracle.com > wrote: > > > > Changing wide mailing list to JIT compiler only. > > > > This deoptimization is normal in Tiered Compilation - it switched > from profiling code (level='3') generated by C1 > > compiler to new code generated by C2 (level='4') which does loop > optimizations. > > > > Thank you for posting inlining information: > > > > @ 17 > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 > bytes) inline (hot) > > \-> TypeProfile (14054/14054 counts) = > com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > > > > I thought before that may be call site is not hot but it is not the > case. > > > > You can do an other experiment to collect log with disabled Tiered > Compilation (only C2 is used): -XX:-TieredCompilation > > Also print assembler code (as you did before) for final compilation > to see if loop is still not vectorized. > > > > Is it possible to post log file (on GitHub?) for me to look? > > > > Thanks, > > Vladimir K > > > > On 2/11/21 6:28 AM, Nicolas Heutte wrote: > > > Hi Vladimir, > > > > > > Thank you for your help. > > > > > > I'm currently running Java 11.0.9, and I did not use any VM flag > of note. > > > > > > I checked the content of the compilation log, and it seems that > ArrayFloatToArrayFloatVectorBinding::plus() was > > > deoptimized in order to allow AVector::plus() to be compiled: > > > > > > > > > method='com.qfs.vector.impl.AVector plus (Lcom/qfs/vector/IVector;)V' > bytes='23' > > > count='916' iicount='916' level='3' stamp='7394.056' > comment='tiered' hot_count='896'/> > > > > > > pc='0x00000296d0785b94' compile_id='17257' compiler='c1' level='3'> > > > method='com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding plus > > > (Lcom/qfs/vector/IVector;Lcom/qfs/vector/IVector;)V' bytes='69' > count='909' backedge_count='155602' iicount='910'/> > > > > > > > > > The last compilation entry for AVector::plus() is: > > > > > > > > > entry='0x00000296d6af32c0' size='1960' > > address='0x00000296d6af3110' > > > relocation_offset='376' insts_offset='432' stub_offset='1040' > scopes_data_offset='1152' scopes_pcs_offset='1592' > > > dependencies_offset='1880' nul_chk_table_offset='1896' > oops_offset='1064' metadata_offset='1072' > > > method='com.qfs.vector.impl.AVector plus > (Lcom/qfs/vector/IVector;)V' bytes='23' count='172425' iicount='172425' > > > stamp='7394.199'/> > > > level='2' stamp='7394.199'/> > > > @ 1 > com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) inline > > (hot) > > > \-> TypeProfile (14552/14552 > counts) = com/qfs/vector/array/impl/ArrayFloatVector > > > @ 7 > com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) inline > > (hot) > > > \-> TypeProfile (14150/14150 > counts) = com/qfs/vector/array/impl/ArrayFloatVector > > > @ 10 > com.qfs.vector.binding.impl.VectorBindings::getBinding (9 bytes) inline > (hot) > > > @ 5 > > > com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::getBinding > (22 > > > bytes) inline (hot) > > > @ 3 > > > com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::hasBinding > > > (34 bytes) inline (hot) > > > @ 17 > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 > > bytes) > > > inline (hot) > > > \-> TypeProfile (14054/14054 > counts) = > > > com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > > > @ 12 > com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) inline (hot) > > > @ 22 > com.qfs.vector.impl.AVector::checkIndex (37 bytes) inline (hot) > > > @ 6 > com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) inline (hot) > > > @ 27 > com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) > > accessor > > > @ 34 > com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) > > accessor > > > > > > > > > Unfortunately, I do not have access to a debug VM build, so I > cannot run the second test you recommend. > > > > > > Best regards, > > > Nicolas Heutte > > > > > > On Wed, Feb 10, 2021 at 7:36 PM Vladimir Kozlov < > vladimir.kozlov at oracle.com > > vladimir.kozlov at oracle.com>>> wrote: > > > > > > Hi, Nicolas > > > > > > Looks like, when inlined, the loop from > ArrayFloatToArrayFloatVectorBinding::plus() was not optimized at all: > > it is not > > > unrolled and has range checks. Such loops are not vectorized > (you need unrolling and no checks). > > > > > > What Java version you are running? What HotSpot VM flags you > are using when running application? > > > > > > Run application with -XX:+LogCompilation and look on > compilation data in hotspot_pid.log file for caller > > > AVector::plus(). > > > > > > VM also has several flags to trace loop optimizations but > they are only available in debug VM build. If you > > have access > > > to such build run with -XX:+PrintCompilation > -XX:+TraceLoopOpts flags. > > > > > > Thanks, > > > Vladimir K > > > > > > On 2/10/21 9:24 AM, Nicolas Heutte wrote: > > > > Hi all, > > > > > > > > I am encountering a performance issue caused by the > interaction between > > > > method inlining and automatic vectorization. > > > > > > > > Our application aggregates arrays intensively using a > method named > > > > ArrayFloatToArrayFloatVectorBinding.plus() with the > following code: > > > > > > > > for (int i = 0; i < srcLen; ++i) { > > > > > > > > dstArray[i] += srcArray[i]; > > > > > > > > } > > > > > > > > When we microbenchmark this method we observe fast > performance close to the > > > > practical memory bandwidth and when we print the assembly > code we observe > > > > loop unrolling and automatic vectorization with SIMD > instructions. > > > > > > > > 0x000001ef4600abf0: vmovdqu 0x10(%r14,%r13,4),%ymm0 > > > > > > > > 0x000001ef4600abf7: vaddps 0x10(%rcx,%r13,4),%ymm0,%ymm0 > > > > > > > > 0x000001ef4600abfe: vmovdqu %ymm0,0x10(%r14,%r13,4) > > > > > > > > 0x000001ef4600ac05: movslq %r13d,%r11 > > > > > > > > 0x000001ef4600ac08: vmovdqu 0x30(%r14,%r11,4),%ymm0 > > > > > > > > 0x000001ef4600ac0f: vaddps 0x30(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > 0x000001ef4600ac16: vmovdqu %ymm0,0x30(%r14,%r11,4) > > > > > > > > 0x000001ef4600ac1d: vmovdqu 0x50(%r14,%r11,4),%ymm0 > > > > > > > > 0x000001ef4600ac24: vaddps 0x50(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > 0x000001ef4600ac2b: vmovdqu %ymm0,0x50(%r14,%r11,4) > > > > > > > > 0x000001ef4600ac32: vmovdqu 0x70(%r14,%r11,4),%ymm0 > > > > > > > > 0x000001ef4600ac39: vaddps 0x70(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > 0x000001ef4600ac40: vmovdqu %ymm0,0x70(%r14,%r11,4) > > > > > > > > 0x000001ef4600ac47: vmovdqu 0x90(%r14,%r11,4),%ymm0 > > > > > > > > 0x000001ef4600ac51: vaddps 0x90(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > 0x000001ef4600ac5b: vmovdqu %ymm0,0x90(%r14,%r11,4) > > > > > > > > 0x000001ef4600ac65: vmovdqu 0xb0(%r14,%r11,4),%ymm0 > > > > > > > > 0x000001ef4600ac6f: vaddps 0xb0(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > 0x000001ef4600ac79: vmovdqu %ymm0,0xb0(%r14,%r11,4) > > > > > > > > 0x000001ef4600ac83: vmovdqu 0xd0(%r14,%r11,4),%ymm0 > > > > > > > > 0x000001ef4600ac8d: vaddps 0xd0(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > 0x000001ef4600ac97: vmovdqu %ymm0,0xd0(%r14,%r11,4) > > > > > > > > 0x000001ef4600aca1: vmovdqu 0xf0(%r14,%r11,4),%ymm0 > > > > > > > > 0x000001ef4600acab: vaddps 0xf0(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > 0x000001ef4600acb5: vmovdqu %ymm0,0xf0(%r14,%r11,4) > ;*fastore > > > > {reexecute=0 rethrow=0 return_oop=0} > > > > > > > > ; - > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > > > > (line 41) > > > > > > > > 0x000001ef4600acbf: add $0x40,%r13d ;*iinc > {reexecute=0 > > > > rethrow=0 return_oop=0} > > > > > > > > ; - > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > > > > (line 40) > > > > > > > > 0x000001ef4600acc3: cmp %eax,%r13d > > > > > > > > 0x000001ef4600acc6: jl 0x000001ef4600abf0 ;*goto > {reexecute=0 > > > > rethrow=0 return_oop=0} > > > > > > > > ; - > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > > > > (line 40) > > > > > > > > > > > > > > > > In the real application, this method is actually inlined > in a higher level > > > > method named AVector.plus(). Unfortunately, the inlined > version of the > > > > aggregation code is not vectorized anymore: > > > > > > > > > > > > > > > > 0x000001ef460180a0: cmp %ebx,%r11d > > > > > > > > 0x000001ef460180a3: jae 0x000001ef460180e6 > > > > > > > > 0x000001ef460180a5: vmovss 0x10(%r8,%r11,4),%xmm1 > ;*faload {reexecute=0 > > > > rethrow=0 return_oop=0} > > > > > > > > ; - > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 > > > > (line 41) > > > > > > > > ; - > > > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > > > 0x000001ef460180ac: cmp %ecx,%r11d > > > > > > > > 0x000001ef460180af: jae 0x000001ef46018104 > > > > > > > > 0x000001ef460180b1: vaddss 0x10(%r9,%r11,4),%xmm1,%xmm1 > > > > > > > > 0x000001ef460180b8: vmovss %xmm1,0x10(%r8,%r11,4) > ;*fastore {reexecute=0 > > > > rethrow=0 return_oop=0} > > > > > > > > ; - > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > > > > (line 41) > > > > > > > > ; - > > > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > > > 0x000001ef460180bf: inc %r11d ;*iinc > {reexecute=0 > > > > rethrow=0 return_oop=0} > > > > > > > > ; - > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > > > > (line 40) > > > > > > > > ; - > > > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > > > 0x000001ef460180c2: cmp %r10d,%r11d > > > > > > > > 0x000001ef460180c5: jl 0x000001ef460180a0 ;*goto > {reexecute=0 > > > > rethrow=0 return_oop=0} > > > > > > > > ; - > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > > > > (line 40) > > > > > > > > ; - > > > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > > > > > > > > > > > This causes a significant performance drop, compared to a > run where we > > > > explicitly disable the inlining and observe automatically > vectorized code > > > > again ( > > > > > -XX:CompileCommand=dontinline,com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding.plus > > > > ). > > > > > > > > > > > > How do you guys explain that behavior of the JIT compiler? > Is this a known > > > > and tracked issue, could it be fixed in the JVM? Can we do > something in the > > > > java code to prevent this from happening? > > > > > > > > > > > > Best regards, > > > > > > > > Nicolas Heutte > > > > > > > > > > From aph at openjdk.java.net Wed Feb 17 16:16:40 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Feb 2021 16:16:40 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v3] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 11:20:20 GMT, Eric Liu wrote: >> This patch transforms '(x >>> rshift) + (x << lshift)' into >> 'RotateRight(x, rshift)' during GVN phase when both the shift exponents >> are constants and their sum equals to the number of bits for the type >> of shift base. >> >> This patch implements some new match rules for AArch64 instructions >> which can take ROR as the optional shift. Such instructions are 'and', >> 'or', 'eor', 'eon', 'bic' and 'orn'. >> >> ror w11, w2, #5 >> eor w0, w1, w11 >> >> With this patch, above code could be optimized to below: >> >> eor w0, w1, w2, ror #5 >> >> Finally, the patch refactors TestRotate.java[1][2]. >> >> Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, >> jdk::jdk_core, langtools::tier1. >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8252776 >> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039911.html > > Eric Liu has updated the pull request incrementally with one additional commit since the last revision: > > Add benchmark test > > Change-Id: I63ca51d06070a07e5c20daf4b42d2c8d7237a1da OK. For what it's worth, I doubt that this will be suitable for backporting to 8u or 11u. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1858 From shade at openjdk.java.net Wed Feb 17 16:24:52 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 17 Feb 2021 16:24:52 GMT Subject: RFR: 8261912: Code IfNode::fold_compares_helper more defensively Message-ID: As [JDK-8261914](https://bugs.openjdk.java.net/browse/JDK-8261914) indicates, there are cases that break the internal asserts in `IfNode::fold_compares_helper`, code added by JDK-8073480 in JDK 9. Unfortunately, release builds would happily miscompile when that happens. It would be better to code `IfNode::fold_compares_helper` more defensively, so it bails when asserts are violated. This implicitly works around the bug in JDK-8261914. The goal for this limited workaround is to be trivially backportable in order to quickly unbreak 11u, 16u and 17. The alternative is, instead of the early returns is to do: lo = NULL; hi = NULL; ...and then wait for for the method epilog to handle it. I have no preference to either style, as the blocks this patch affects already has some early returns, and `lo/hi = NULL` are also used. Additional testing: - [x] Linux x86_64 fastdebug `tier1` - [x] Linux x86_64 fastdebug `tier2` - [x] Failing JRuby reproducer from JDK-8261914, now passing in release mode with hundreds of iterations ------------- Commit messages: - 8261912: Code IfNode::fold_compares_helper more defensively Changes: https://git.openjdk.java.net/jdk/pull/2610/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2610&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261912 Stats: 23 lines in 1 file changed: 15 ins; 4 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2610.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2610/head:pull/2610 PR: https://git.openjdk.java.net/jdk/pull/2610 From redestad at openjdk.java.net Wed Feb 17 16:27:40 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 17 Feb 2021 16:27:40 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 14:30:58 GMT, ?????? ??????? wrote: > Non-static classes hold a link to their parent classes, which in many cases can be avoided. src/java.base/share/classes/java/lang/invoke/DelegatingMethodHandle.java line 192: > 190: > 191: /* Placeholder class for DelegatingMethodHandles generated ahead of time */ > 192: static final class Holder {} For the four `Holder` classes in `java.lang.invoke`, the class is generated when running jlink via `java.lang.invoke.GenerateJLIClassesHelper`. To stay in sync with the definition you'd have to update that code. Also, since these `Holder` classes will only contain static methods and are never instantiated then I'm not sure it matters whether the classes are static or not. ------------- PR: https://git.openjdk.java.net/jdk/pull/2589 From vlivanov at openjdk.java.net Wed Feb 17 16:37:44 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 17 Feb 2021 16:37:44 GMT Subject: RFR: 8260653: Unreachable nodes keep speculative types alive [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 14:35:59 GMT, Nils Eliasson wrote: >> The RunThese test fails because after first igvn.optimize() (directly after parsing) there are unreachable nodes cycles left. Later when remove_speculative_types() is called - only reachable nodes will have their speculative type removed. At the end of remove_speculative_types() there is an assert that all speculative types have been removed that will fail. >> >> This problem will not cause crashes in production - it is only a sanity test. >> >> I suggest adding a call to PhaseRemoveUseless before remove_speculative_types. This will cost a few extra cycles but it is the only way we can guarantee that no unreachable nodes are left. >> >> When debugging this I experimented with adding a call to verify_graph_edges to check for dead code at the same spot. This triggers failures in a lot of test. The conclusion is that it is very common that we have dead node cycles - but they very rarely keep speculative types alive. >> >> A big thank you to Dean Long how created the reproducer for this bug. >> >> Please review. > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Check if node is live before assert Looks good. src/hotspot/share/opto/phaseX.cpp line 341: > 339: n != sentinel_node && > 340: n->is_Type() && > 341: n->outcnt() > 0 && `live_nodes.member(n)` makes ` n->outcnt() > 0` redundant, doesn't it? ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2606 From github.com+10835776+stsypanov at openjdk.java.net Wed Feb 17 16:38:04 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Wed, 17 Feb 2021 16:38:04 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:24:46 GMT, Claes Redestad wrote: >> ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: >> >> 8261880: Remove static from declarations of Holder nested classes > > src/java.base/share/classes/java/lang/invoke/DelegatingMethodHandle.java line 192: > >> 190: >> 191: /* Placeholder class for DelegatingMethodHandles generated ahead of time */ >> 192: static final class Holder {} > > For the four `Holder` classes in `java.lang.invoke`, the class is generated when running jlink via `java.lang.invoke.GenerateJLIClassesHelper`. To stay in sync with the definition you'd have to update that code. Also, since these `Holder` classes will only contain static methods and are never instantiated then I'm not sure it matters whether the classes are static or not. I'll just revert them ------------- PR: https://git.openjdk.java.net/jdk/pull/2589 From github.com+10835776+stsypanov at openjdk.java.net Wed Feb 17 16:38:03 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Wed, 17 Feb 2021 16:38:03 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible [v2] In-Reply-To: References: Message-ID: > Non-static classes hold a link to their parent classes, which in many cases can be avoided. ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: 8261880: Remove static from declarations of Holder nested classes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2589/files - new: https://git.openjdk.java.net/jdk/pull/2589/files/5650cce4..c6f9cf6b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2589&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2589&range=00-01 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2589.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2589/head:pull/2589 PR: https://git.openjdk.java.net/jdk/pull/2589 From github.com+7806504+liach at openjdk.java.net Wed Feb 17 16:38:04 2021 From: github.com+7806504+liach at openjdk.java.net (liach) Date: Wed, 17 Feb 2021 16:38:04 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:32:39 GMT, ?????? ??????? wrote: >> src/java.base/share/classes/java/lang/invoke/DelegatingMethodHandle.java line 192: >> >>> 190: >>> 191: /* Placeholder class for DelegatingMethodHandles generated ahead of time */ >>> 192: static final class Holder {} >> >> For the four `Holder` classes in `java.lang.invoke`, the class is generated when running jlink via `java.lang.invoke.GenerateJLIClassesHelper`. To stay in sync with the definition you'd have to update that code. Also, since these `Holder` classes will only contain static methods and are never instantiated then I'm not sure it matters whether the classes are static or not. > > I'll just revert them For static methods, since in java language you cannot declare static method in instance inner classes, I'd say making them static makes more sense language-wise. Also making them static reduces compiler synthetic instance field and constructors. ------------- PR: https://git.openjdk.java.net/jdk/pull/2589 From redestad at openjdk.java.net Wed Feb 17 17:27:56 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 17 Feb 2021 17:27:56 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:35:02 GMT, liach wrote: >> I'll just revert them > > For static methods, since in java language you cannot declare static method in instance inner classes, I'd say making them static makes more sense language-wise. Also making them static reduces compiler synthetic instance field and constructors. Incidentally, Java-the-language allows static methods in inner instance classes since JDK 16. And I'm not sure this was ever a restriction at the JVMS level since we've been generating static methods (using ASM) into these inner instance classes since at least JDK 9. ------------- PR: https://git.openjdk.java.net/jdk/pull/2589 From headius at headius.com Wed Feb 17 17:28:54 2021 From: headius at headius.com (Charles Oliver Nutter) Date: Wed, 17 Feb 2021 11:28:54 -0600 Subject: Intermittent JRuby json issue related to tiered or G1 In-Reply-To: <18a23eea-03a2-7f69-c4f4-4d22bddb3eb9@redhat.com> References: <1bac4d6a-e53d-075d-0209-6b737b7a2e77@redhat.com> <7110e0cf-8c77-1d6d-8854-3160d5c901f8@redhat.com> <18a23eea-03a2-7f69-c4f4-4d22bddb3eb9@redhat.com> Message-ID: On Wed, Feb 17, 2021 at 9:14 AM Aleksey Shipilev wrote: > > On 2/17/21 3:50 PM, Aleksey Shipilev wrote: > > I think I'll submit two issues: one that codes fold_compares_helper more defensively like in the > > patch above (this would be backportable) > > This would be: > https://bugs.openjdk.java.net/browse/JDK-8261912 > > > and then the follow-up that targets to address the actual > > problem (why do we have uncanonicalized test). > > And this would be: > https://bugs.openjdk.java.net/browse/JDK-8261914 This is outstanding sleuthing, thank you Aleksey! I'm not happy that there's a JVM issue but at least we can stop banging our heads against this as a JRuby bug. Given your discoveries, I would have felt safe saying this is purely a tiered JIT issue... but our users have reported that switching away from G1 also eliminates the problem. Were they just lucky? Could there be a separate issue? I'm trying to come up with a short-term workaround with minimal impact. Switching GC may not be in the cards but disabling tiered compilation would probably be acceptable for production environments...if that is sufficient. As always let me know if we can provide any more information. The affected JRuby users and I will continue monitoring. - Charlie From github.com+828220+forax at openjdk.java.net Wed Feb 17 17:44:47 2021 From: github.com+828220+forax at openjdk.java.net (=?UTF-8?B?UsOpbWk=?= Forax) Date: Wed, 17 Feb 2021 17:44:47 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 17:24:50 GMT, Claes Redestad wrote: >> For static methods, since in java language you cannot declare static method in instance inner classes, I'd say making them static makes more sense language-wise. Also making them static reduces compiler synthetic instance field and constructors. > > Incidentally, Java-the-language allows static methods in inner instance classes since JDK 16. And I'm not sure this was ever a restriction at the JVMS level since we've been generating static methods (using ASM) into these inner instance classes since at least JDK 9. Inner classes doesn't really exist for the JVM, it's just an attribute (in fact, a pair of attributes) that is read/write by javac (it's very similar to the way generics work). So it is Ok to have static methods in inner classes since Java 1.1 from the JVM POV with the caveat that you may not be all to call them from Java-the-language. Also since Java 11, inner classes are also nestmate and those attributes (NestHost/NestMembers) change the behavior of the VM. ------------- PR: https://git.openjdk.java.net/jdk/pull/2589 From simonis at openjdk.java.net Wed Feb 17 17:45:44 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Wed, 17 Feb 2021 17:45:44 GMT Subject: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object [v2] In-Reply-To: <3JNae6rXuxc_Q6YoALCH8Ku510Zne5ftqf1z8OCGkHQ=.2ebf5cd0-27e2-44e3-adf7-065179cc9ffd@github.com> References: <3JNae6rXuxc_Q6YoALCH8Ku510Zne5ftqf1z8OCGkHQ=.2ebf5cd0-27e2-44e3-adf7-065179cc9ffd@github.com> Message-ID: <5vIeI9zFvxAcG-dUY-8iQwXlXDyWJ3VFMUqKamIL4o4=.746707e1-9d41-4fc0-9162-1ac7ca5589f3@github.com> On Mon, 15 Feb 2021 19:00:57 GMT, Xin Liu wrote: >> There are 3 nodes involving in the construction of a java.lang.String object. >> 1. Allocate of itself, aka. alloc >> 2. AllocateArray of a byte array, which is value:byte[], aka. aa >> 3. ArrayCopyNode which copys in the contents of value, aka. ac >> >> Lemma >> When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`. >> >> Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well. >> >> It is possible to rewire `aa` to the source of ac with the correct offset. That is to say, we can replace both `aa` and `ac` with a ?shallow copy? of the source of `ac`. It?s safe if C2 keeps a reference of the source oop for all safepoints. > > Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 36 commits: > > - Merge branch 'master' into optimize_substring > - fix regression for x86-32 > > if LP64 is off, the offset of AddP must be I instead of L. > x86 also doesn't emit encodeP/storeN. it use storeP instead. > - add a statistical counter for OptimizeTempArray. > > -XX:+PrintOptoStatistics shows it > - [SIM-JVM-450] support deoptimization v2 > > because the src oop of scobj may be another scobj, deoptimization sort > all objects in topological order. > > separate creation of dst oop and reassignment of it. > - add a unit test for deoptimization > - [SIM-JVM-450] support deoptimization part2 > > if OptimizeTempArray eliminates an AllocateArrayNode, scalar replacement will > create a nested SafePointScalarObjectNode for the field value:byte[] of j.l.String. > we use the nested sobj and an ObjectValue an envelope. it consists of 3 fields: > 1. src 2. src_positio 3. length. > > deoptimizaton recognizes this ad-hoc ObjectValue and re-allocate an arrayOop > for the String object. > - enable OptimizeTempArray by default > - Merge branch 'master' into optimize_substring > - Revert "8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set" > > This reverts commit a49e34688d7d7c9d3c0d9c824d33f359613c2fc1. > - Revert "add a new bucket afterea_late_inlines" > > afterea_late_inlines bucket is not useful. revert it and its relevant changes > - ... and 26 more: https://git.openjdk.java.net/jdk/compare/849f4c0f...21693ddd src/hotspot/share/opto/macro.cpp line 1317: > 1315: // > 1316: // > 1317: // EncodeP: delele because we don't need storeN "delete" not "delele" ------------- PR: https://git.openjdk.java.net/jdk/pull/2570 From simonis at openjdk.java.net Wed Feb 17 17:49:40 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Wed, 17 Feb 2021 17:49:40 GMT Subject: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 09:00:25 GMT, Richard Reingruber wrote: >>> Hi, >>> >>> this is a smart optimization. >>> >>> > There are 3 nodes involving in the construction of a java.lang.String object. >>> > ``` >>> > 1. Allocate of itself, aka. alloc >>> > >>> > 2. AllocateArray of a byte array, which is value:byte[], aka. aa >>> > >>> > 3. ArrayCopyNode which copys in the contents of value, aka. ac >>> > ``` >>> > >>> > >>> > Lemma >>> > When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`. >>> > Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well. >>> >>> Are you saying that the source of `ac` cannot be accessed by another thread because of the cloning in the constructor? But the resulting string instance which is used to construct the non-escape instance can be GlobalEscape and then the source of `ac` is accessible to other threads, isn't it? >> >> Hi, @reinrich >> You might not know, but I learn how to reallocate an object in deoptimization from your previous patches. Thank you! >> >> You are right. The source of ac (`src`) might be escaped. I didn't say other threads can't access it. I said we need to guarantee the `src` is stable, or it's a "frozen" array in JDK-8261007. >> >> To be honest, I didn't check the src is frozen. In practice, I only see ArrayCopy in construction [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringLatin1.java#L769).This method is creating a new substring from an established array, so its value is `stable`. >> >> After I read your comment, I went through String.java. I do find one open-ended constructor. yes, it's a problem! >> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L395 >> >> If the frozen attribute is not present, what I can come up with. an array is "stable". What's do you think? >> 1) has annotation "stable" or >> 2) it's non-escaped and >> 3) can't find any store nodes along its mem stream. > > Hi @navyxliu, > >> You might not know, but I learn how to reallocate an object in deoptimization from your previous patches. Thank you! > > Oh, great! :) > >> You are right. The source of ac (`src`) might be escaped. I didn't say other threads can't access it. I said we need to guarantee the `src` is stable, or it's a "frozen" array in JDK-8261007. >> >> To be honest, I didn't check the src is frozen. In practice, I only see ArrayCopy in construction [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringLatin1.java#L769).This method is creating a new substring from an established array, so its value is `stable`. > > Sorry, I wasn't aware of the @Stable annotation on the value array. > > https://github.com/openjdk/jdk/blob/d19503353e5c347ce393544a3a30d5caec53d133/src/java.base/share/classes/java/lang/String.java#L154 > > So my concern was that another thread could for example use reflection to modify > the value array but I reckon this is illegal then (I wonder if it is checked in > reflection...). > > Also there is already UseStringDeduplication that relies on the value array being stable. > >> After I read your comment, I went through String.java. I do find one open-ended constructor. yes, it's a problem! >> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L395 > > I see this one is deprecated. Why do you think there's a problem? > >> If the frozen attribute is not present, what I can come up with. an array is "stable". What's do you think? >> >> 1. has annotation "stable" or >> >> 2. it's non-escaped and >> >> 3. can't find any store nodes along its mem stream. > > Currently I'd think this is not necessary. `@Stable` is strong enough. > > The optimization reminds me a bit of earlier Java versions where String > instances had an offset field and the value array could be shared among String > instances. > > I guess the new String needs to be scalarized mostly because you cannot take > care of the offset otherwise. > > Coincidentally I'm currently looking at > [LoadNode::can_see_arraycopy_value()](https://github.com/openjdk/jdk/blob/b955f85e03bafe8ce39677d0af06bf1ceb7e2cbb/src/hotspot/share/opto/memnode.cpp#L951) > which does something similar you want to do. There I don't see any concerns > about the src being stable. In fact I don't think this is correct. Just a side > mark... > > These are my (somewhat random) thoughts so far. I'd think it is a legal and useful optimization (though haven't yet looked at the code yet). > > Thanks, Richard. What about GC? What happens if the original String isn't reachable any more? Do you put a reference to the byte array into the corresponding oop maps to make sure it can't be garbage collected? ------------- PR: https://git.openjdk.java.net/jdk/pull/2570 From github.com+168222+mgkwill at openjdk.java.net Wed Feb 17 17:51:18 2021 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Wed, 17 Feb 2021 17:51:18 GMT Subject: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v3] In-Reply-To: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> References: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> Message-ID: > 8261671: X86 I2L conversion can be skipped for certain masked positive values > > For the following expression: > (long)(value & mask) > Where value is of int type and mask is constant (power of two ? 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq > > Before: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ? 1.496 ns/op > > > After: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ? 0.067 ns/op > > Signed-off-by: Marcus G K Williams Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision: Revert to predicate supports_bmi2 Signed-off-by: Marcus G K Williams ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2590/files - new: https://git.openjdk.java.net/jdk/pull/2590/files/5dc976e2..2717297b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2590.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2590/head:pull/2590 PR: https://git.openjdk.java.net/jdk/pull/2590 From shade at redhat.com Wed Feb 17 17:54:45 2021 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 17 Feb 2021 18:54:45 +0100 Subject: Intermittent JRuby json issue related to tiered or G1 In-Reply-To: References: <1bac4d6a-e53d-075d-0209-6b737b7a2e77@redhat.com> <7110e0cf-8c77-1d6d-8854-3160d5c901f8@redhat.com> <18a23eea-03a2-7f69-c4f4-4d22bddb3eb9@redhat.com> Message-ID: On 2/17/21 6:28 PM, Charles Oliver Nutter wrote: > Given your discoveries, I would have felt safe saying this is purely a > tiered JIT issue... but our users have reported that switching away > from G1 also eliminates the problem. Were they just lucky? Could there > be a separate issue? Having spent a day trying to reproduce the issue, I think the conditions under which this reproduces are very flaky, and minor things can change the reproducibility. The bug is in C2 code, and that code apparently takes a path that leads to the error, under some conditions that are not yet clear to me. Profile taking different shapes I think affects it the most, this is why it lead me to the unsuccessful reverse bisect. That minor profiling code adjustment dropped the incidence for about 10x for me! Argh. Maybe switching the GC does indeed lead the application away from the bug, I have no firm evidence about this. Note that all this time JDK 9+ was running with G1 without us knowing about this problem. > I'm trying to come up with a short-term workaround with minimal > impact. Switching GC may not be in the cards but disabling tiered > compilation would probably be acceptable for production > environments...if that is sufficient. The short-term workaround should really come with: https://bugs.openjdk.java.net/browse/JDK-8261914 I plan to propose it for 16u and 11u backports once it hopefully lands in mainline. I cannot see any diagnostic/experimental flag that would disable the problematic path in C2 specifically. So without JDK-8261914, we are left with several nuclear options: a) disabling C2 completely with -XX:TieredStopAtLevel=1, which *would* affect performance; b) falling back to 8u, which does not have this C2 code at all There is the "usual" workaround of disabling the affected method compilation with -XX:CompileCommand=exclude,<...> -- but that kinda hopes that only one method compilation is affected. > As always let me know if we can provide any more information. The > affected JRuby users and I will continue monitoring. I had run thousands of runs of the JRuby reproducer, and it works fine with JDK-8261914 defensive patch. Without it, it fails about 1 time out of 10 with the error that Freaky reported. So I am pretty confident it is the same issue. You might want to test it yourselves today, if you can build mainline JDK. Or wait for patches to land in mainline and pick up EA binaries to test. -- Thanks, -Aleksey From github.com+168222+mgkwill at openjdk.java.net Wed Feb 17 17:59:38 2021 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Wed, 17 Feb 2021 17:59:38 GMT Subject: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v3] In-Reply-To: References: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> <9XBe7BRG8tFJ3ZHqmLa-MjmZWrJS0nIGKJzsjViGVoI=.007b2404-c93f-408e-a28e-c5491eeceb4e@github.com> Message-ID: On Wed, 17 Feb 2021 05:56:04 GMT, Florian Weimer wrote: >> Added `predicate(VM_Version::supports_bmi2() && VM_Version::is_intel());` > > I'm so sorry, I confused this with `PDEP` and `PEXT`. `BZHI` is actually fine on Zen. So the original code is correct. No worries @fweimer. I've reverted to the original. I looked at Agner's Optimization Guide but I couldn't find a place that said Zen 2 and earlier used microcode for BZHI. I guess we know why now ??. Happy to hear any other review comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 From neliasso at openjdk.java.net Wed Feb 17 18:16:02 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 17 Feb 2021 18:16:02 GMT Subject: RFR: 8260653: Unreachable nodes keep speculative types alive [v3] In-Reply-To: References: Message-ID: > The RunThese test fails because after first igvn.optimize() (directly after parsing) there are unreachable nodes cycles left. Later when remove_speculative_types() is called - only reachable nodes will have their speculative type removed. At the end of remove_speculative_types() there is an assert that all speculative types have been removed that will fail. > > This problem will not cause crashes in production - it is only a sanity test. > > I suggest adding a call to PhaseRemoveUseless before remove_speculative_types. This will cost a few extra cycles but it is the only way we can guarantee that no unreachable nodes are left. > > When debugging this I experimented with adding a call to verify_graph_edges to check for dead code at the same spot. This triggers failures in a lot of test. The conclusion is that it is very common that we have dead node cycles - but they very rarely keep speculative types alive. > > A big thank you to Dean Long how created the reproducer for this bug. > > Please review. Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: Remove outcnt check ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2606/files - new: https://git.openjdk.java.net/jdk/pull/2606/files/82ab1066..ac23dad8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2606&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2606&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2606.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2606/head:pull/2606 PR: https://git.openjdk.java.net/jdk/pull/2606 From neliasso at openjdk.java.net Wed Feb 17 18:16:04 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 17 Feb 2021 18:16:04 GMT Subject: RFR: 8260653: Unreachable nodes keep speculative types alive [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:34:43 GMT, Vladimir Ivanov wrote: >> Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Check if node is live before assert > > src/hotspot/share/opto/phaseX.cpp line 341: > >> 339: n != sentinel_node && >> 340: n->is_Type() && >> 341: n->outcnt() > 0 && > > `live_nodes.member(n)` makes ` n->outcnt() > 0` redundant, doesn't it? Yes it does. I've updated the patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/2606 From kvn at openjdk.java.net Wed Feb 17 18:35:49 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 17 Feb 2021 18:35:49 GMT Subject: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v3] In-Reply-To: References: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> Message-ID: On Wed, 17 Feb 2021 17:51:18 GMT, Marcus G K Williams wrote: >> 8261671: X86 I2L conversion can be skipped for certain masked positive values >> >> For the following expression: >> (long)(value & mask) >> Where value is of int type and mask is constant (power of two ? 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq >> >> Before: >> Benchmark Mode Cnt Score Error Units >> SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ? 1.496 ns/op >> >> >> After: >> Benchmark Mode Cnt Score Error Units >> SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ? 0.067 ns/op >> >> Signed-off-by: Marcus G K Williams > > Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision: > > Revert to predicate supports_bmi2 > > Signed-off-by: Marcus G K Williams Please, add a test which verifies result of generated code for different ranges of mask (especially corner cases). See `compiler//codegen/BMI1.java` There are also example of verification of generated assembler in `compiler//intrinsics/bmi/` ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2590 From github.com+2249648+johntortugo at openjdk.java.net Wed Feb 17 18:41:39 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Wed, 17 Feb 2021 18:41:39 GMT Subject: RFR: 8241502: Migrate x86_64.ad to MacroAssembler In-Reply-To: References: <0azhJ4pD5Tq_lkpPtYMpQBjokflcSQEdWP2Rz9HBm6k=.c3ece6fd-1ae7-49ea-a6eb-ec88a9fbd54d@github.com> Message-ID: <7o74mtnumvPhVoz48udS6UTCoX39kjgfCU3jrogHQeg=.506c281b-38a4-42fa-bdf9-9214e49c5b8a@github.com> On Wed, 17 Feb 2021 09:47:13 GMT, Dean Long wrote: >> Thank you all for the feedback! >> >> @iklam - I'll check that and let you know once I make more conversions. >> >> @dean-long - That would be great. I'm all ears for the best way to test this! > > Here's one way to test both versions, using loadRange as an example. It's not exactly pretty, but it seems to work. > [loadRange.txt](https://github.com/openjdk/jdk/files/5994725/loadRange.txt) @dean-long - I recently hacked something pretty similar - basically, I added a flag to the CodeSection class to make it print emitted bytes whenever the flag was set, then I created two encoding classes in the AD file to toggle the print flag. That way I was able to _visually_ compare the output of the two versions of the code. Your approach with the CRC is much better. Thanks a lot! ------------- PR: https://git.openjdk.java.net/jdk/pull/2420 From github.com+168222+mgkwill at openjdk.java.net Wed Feb 17 18:57:41 2021 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Wed, 17 Feb 2021 18:57:41 GMT Subject: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v3] In-Reply-To: References: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> Message-ID: On Wed, 17 Feb 2021 18:33:16 GMT, Vladimir Kozlov wrote: > Please, add a test which verifies result of generated code for different ranges of mask (especially corner cases). See `compiler//codegen/BMI1.java` > There are also example of verification of generated assembler in `compiler//intrinsics/bmi/` I'm taking a look at a test to verify generated code like the BMI1 test you suggest above. There is a current test that verifies operation indirectly (both long to int and int to long): test/hotspot/jtreg/compiler/c2/TestSkipLongToIntCast.java Do we need both? A number of similar patches have been merged without, https://hg.openjdk.java.net/jdk/jdk/rev/d0f55423e913 for example. ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 From github.com+42899633+eastig at openjdk.java.net Wed Feb 17 19:20:42 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 17 Feb 2021 19:20:42 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v4] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 12:04:56 GMT, Evgeny Astigeevich wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> reimplement this feature. withdraw my intrusive change in outputStream. >> use stringStream only for the constant OopPtr. after oop->print_on(st), >> delete all appearances of '\n' >> - Merge branch 'master' into JDK-8260198 >> - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> fix merge conflict. >> - Merge branch 'master' into JDK-8260198 >> - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > src/hotspot/share/opto/type.cpp line 4049: > >> 4047: ss.print(" "); >> 4048: const_oop()->print_oop(&ss); >> 4049: ss.tr_delete('\n'); > > `tr_delete` is expensive. Also deleting something in a stream does not fit into a concept of streams. > I see that the content of `ss` is traversed many times. > What about this code: > for (const char *str = ss.base(); *str; ) { > size_t i = 0; > while (str[i] && str[i] != '\n' ) { > ++i; > } > st->print_raw(str, i); > str += i; > while (*str == '\n') { > ++str; > } > } Another option: class filterStringStream: public stringStream { private: char ch; public: filterStringStream(char ch_to_filter, size_t initial_bufsize = 256) : stringStream(initial_bufsize), ch(ch_to_filter) {} virtual void write(const char* c, size_t len) override { const char* e = c + len; while (c != e) { size_t i = 0; while ((c+i) != e && c[i] != ch ) { ++i; } stringStream::write(c, i); c += i; while (c != e && *ch == ch) { ++c; } } } }; Your code will be: filterStringStream ss('\n'); ss.print(" "); const_oop->print_oop(&ss); st->print_raw(ss.base(), ss,size()); ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From github.com+42899633+eastig at openjdk.java.net Wed Feb 17 19:25:40 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 17 Feb 2021 19:25:40 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v4] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 19:16:59 GMT, Evgeny Astigeevich wrote: >> src/hotspot/share/opto/type.cpp line 4049: >> >>> 4047: ss.print(" "); >>> 4048: const_oop()->print_oop(&ss); >>> 4049: ss.tr_delete('\n'); >> >> `tr_delete` is expensive. Also deleting something in a stream does not fit into a concept of streams. >> I see that the content of `ss` is traversed many times. >> What about this code: >> for (const char *str = ss.base(); *str; ) { >> size_t i = 0; >> while (str[i] && str[i] != '\n' ) { >> ++i; >> } >> st->print_raw(str, i); >> str += i; >> while (*str == '\n') { >> ++str; >> } >> } > > Another option: > class filterStringStream: public stringStream { > private: > char ch; > public: > filterStringStream(char ch_to_filter, size_t initial_bufsize = 256) : stringStream(initial_bufsize), ch(ch_to_filter) {} > > virtual void write(const char* c, size_t len) override { > const char* e = c + len; > while (c != e) { > size_t i = 0; > while ((c+i) != e && c[i] != ch ) { > ++i; > } > stringStream::write(c, i); > c += i; > while (c != e && *ch == ch) { > ++c; > } > } > } > }; > > Your code will be: > filterStringStream ss('\n'); > ss.print(" "); > const_oop->print_oop(&ss); > st->print_raw(ss.base(), ss,size()); > `tr_delete` is expensive. Also deleting something in a stream does not fit into a concept of streams. > I see that the content of `ss` is traversed many times. > What about this code: > > ``` > for (const char *str = ss.base(); *str; ) { > size_t i = 0; > while (str[i] && str[i] != '\n' ) { > ++i; > } > st->print_raw(str, i); > str += i; > while (*str == '\n') { > ++str; > } > } > ``` You can put this code in a function like `print_filtering_ch(char, const stringStream&, outputStream*)` ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From vladimir.kozlov at oracle.com Wed Feb 17 20:05:13 2021 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 17 Feb 2021 12:05:13 -0800 Subject: [External] : Re: SuperWord loop optimization lost after method inlining In-Reply-To: References: <87dcb670-622f-6b05-7c06-289f9b4f3634@oracle.com> <08ce2121-bbb4-3ea5-8f05-efeb33df7b74@oracle.com> Message-ID: <09a53249-00de-91c1-d88b-6c816ce6dd14@oracle.com> Unfortunately it is still not the file I am looking for. First, remove -XX:+PrintAssembly flag from command line. I have already files with assembler code. Second, I see link to the file I am looking for: If you still have it, please send it. If application stopped before normal exit that file is not merged into hotspot_pid.log file. If you don't have it - do an other run with -XX:CICompilerCount=1 to use only one C2 compiler thread with Tiered off. It will simplify ordering of log. You can also do an other experiment without collecting log. Run app with next flags to disable loop strip minning optimization: -XX:-UseCountedLoopSafepoints -XX:LoopStripMiningIter=0 Thanks, Vladimir K On 2/17/21 2:34 AM, Nicolas Heutte wrote: > Hi Vladimir, > > I have rerun the test with the appropriate options, the obtained logs are in this folder: > https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing > > > Best regards, > Nicolas Heutte > > On Tue, Feb 16, 2021 at 11:35 PM Vladimir Kozlov > wrote: > > Hi Nicolas, > > The file you shared has only assembler code. Yes, it shows that when ArrayFloatToArrayFloatVectorBinding::plus() is > inlined into AVector::plus() it is not vectorized. > > But I asked for an other file (hotspot_pid.log) which is generated when you run app with > -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation flags. It should start with: > > > > > > Java HotSpot(TM) 64-Bit Server VM > > > 11.0.9+7-LTS > > > Thanks, > Vladimir K > > On 2/15/21 5:19 AM, Nicolas Heutte wrote: > > Hi Vladimir, > > > > I've tried disabling tiered compilation, as you requested. It seems that the inlining was performed slightly > > differently, but the issue remains. As you can see in this excerpt, the main loop isn't properly vectorized: > > > >? ? 0x00000254b0d4bf54: cmp ? ?%r11d,%r8d > >? ? 0x00000254b0d4bf57: jae ? ?0x00000254b0d4c19e > >? ? 0x00000254b0d4bf5d: vmovss 0x10(%rcx,%r8,4),%xmm9 ?;*faload {reexecute=0 rethrow=0 return_oop=0} > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 (line 41) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > (line 118) > > > >? ? 0x00000254b0d4bf64: cmp ? ?%ebx,%r8d > >? ? 0x00000254b0d4bf67: jae ? ?0x00000254b0d4c1ec > >? ? 0x00000254b0d4bf6d: vaddss 0x10(%rdi,%r8,4),%xmm9,%xmm9 > >? ? 0x00000254b0d4bf74: vmovss %xmm9,0x10(%rcx,%r8,4) ?;*fastore {reexecute=0 rethrow=0 return_oop=0} > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 (line 41) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > (line 118) > > > >? ? 0x00000254b0d4bf7b: inc ? ?%r8d ? ? ? ? ? ? ? ;*iinc {reexecute=0 rethrow=0 return_oop=0} > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 (line 40) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > (line 118) > > > >? ? 0x00000254b0d4bf7e: cmp ? ?%r9d,%r8d > >? ? 0x00000254b0d4bf81: jl ? ? 0x00000254b0d4bf54 ?;*goto {reexecute=0 rethrow=0 return_oop=0} > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 (line 40) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > (line 118) > > > > > > > > Here is the link to the full log, should you want to take a look at it: > > https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing > > > > > > > > > > Best regards, > > Nicolas Heutte > > > > On Thu, Feb 11, 2021 at 7:05 PM Vladimir Kozlov > >> wrote: > > > >? ? ?Changing wide mailing list to JIT compiler only. > > > >? ? ?This deoptimization is normal in Tiered Compilation - it switched from profiling code (level='3') generated by C1 > >? ? ?compiler to new code generated by C2 (level='4') which does loop optimizations. > > > >? ? ?Thank you for posting inlining information: > > > >? ? ? ? ? ?@ 17? ?com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 bytes) inline (hot) > >? ? ? ? ? ? ? \-> TypeProfile (14054/14054 counts) = com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > > > >? ? ?I thought before that may be call site is not hot but it is not the case. > > > >? ? ?You can do an other experiment to collect log with disabled Tiered Compilation (only C2 is used): > -XX:-TieredCompilation > >? ? ?Also print assembler code (as you did before) for final compilation to see if loop is still not vectorized. > > > >? ? ?Is it possible to post log file (on GitHub?) for me to look? > > > >? ? ?Thanks, > >? ? ?Vladimir K > > > >? ? ?On 2/11/21 6:28 AM, Nicolas Heutte wrote: > >? ? ? > Hi?Vladimir, > >? ? ? > > >? ? ? > Thank you for your help. > >? ? ? > > >? ? ? > I'm currently running Java 11.0.9, and I did not use any VM flag of note. > >? ? ? > > >? ? ? > I checked the content of the compilation log, and it seems that > ArrayFloatToArrayFloatVectorBinding::plus() was > >? ? ? > deoptimized in order to allow AVector::plus() to be compiled: > >? ? ? > > >? ? ? > > >? ? ? > bytes='23' > >? ? ? > count='916' iicount='916' level='3' stamp='7394.056' comment='tiered' hot_count='896'/> > >? ? ? > > >? ? ? > level='3'> > >? ? ? > iicount='910'/> > >? ? ? > > >? ? ? > > >? ? ? > The last compilation entry for AVector::plus() is: > >? ? ? > > >? ? ? > > >? ? ? > >? ? ?address='0x00000296d6af3110' > >? ? ? > relocation_offset='376' insts_offset='432' stub_offset='1040' scopes_data_offset='1152' > scopes_pcs_offset='1592' > >? ? ? > dependencies_offset='1880' nul_chk_table_offset='1896' oops_offset='1064' metadata_offset='1072' > >? ? ? > method='com.qfs.vector.impl.AVector plus (Lcom/qfs/vector/IVector;)V' bytes='23' count='172425' > iicount='172425' > >? ? ? > stamp='7394.199'/> > >? ? ? > > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 1 ? com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) > inline > >? ? ?(hot) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14552/14552 counts) = > com/qfs/vector/array/impl/ArrayFloatVector > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 7 ? com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) > inline > >? ? ?(hot) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14150/14150 counts) = > com/qfs/vector/array/impl/ArrayFloatVector > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 10 ? com.qfs.vector.binding.impl.VectorBindings::getBinding (9 bytes) > inline (hot) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 5 > >? ? ?com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::getBinding (22 > >? ? ? > bytes) ? inline (hot) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 3 > >? ? ?com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::hasBinding > >? ? ? > (34 bytes) ? inline (hot) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 17 > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 > >? ? ?bytes) > >? ? ? > inline (hot) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14054/14054 counts) = > >? ? ? > com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 12 ? com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) > inline (hot) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 22 ? com.qfs.vector.impl.AVector::checkIndex (37 bytes) ? inline (hot) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 6 ? com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) > inline (hot) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 27 ? com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) > >? ? ?accessor > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 34 ? com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) > >? ? ?accessor > >? ? ? > > >? ? ? > > >? ? ? > Unfortunately, I do not have access to a debug VM build, so I cannot run the second test you recommend. > >? ? ? > > >? ? ? > Best regards, > >? ? ? > Nicolas Heutte > >? ? ? > > >? ? ? > On Wed, Feb 10, 2021 at 7:36 PM Vladimir Kozlov > > >? ? ? >>> wrote: > >? ? ? > > >? ? ? >? ? ?Hi, Nicolas > >? ? ? > > >? ? ? >? ? ?Looks like, when inlined, the loop from ArrayFloatToArrayFloatVectorBinding::plus() was not optimized > at all: > >? ? ?it is not > >? ? ? >? ? ?unrolled and has range checks. Such loops are not vectorized (you need unrolling and no checks). > >? ? ? > > >? ? ? >? ? ?What Java version you are running? What HotSpot VM flags you are using when running application? > >? ? ? > > >? ? ? >? ? ?Run application with -XX:+LogCompilation and look on compilation data in hotspot_pid.log file for > caller > >? ? ? >? ? ?AVector::plus(). > >? ? ? > > >? ? ? >? ? ?VM also has several flags to trace loop optimizations but they are only available in debug VM build. > If you > >? ? ?have access > >? ? ? >? ? ?to such build run with -XX:+PrintCompilation -XX:+TraceLoopOpts flags. > >? ? ? > > >? ? ? >? ? ?Thanks, > >? ? ? >? ? ?Vladimir K > >? ? ? > > >? ? ? >? ? ?On 2/10/21 9:24 AM, Nicolas Heutte wrote: > >? ? ? >? ? ? > Hi all, > >? ? ? >? ? ? > > >? ? ? >? ? ? > I am encountering a performance issue caused by the interaction between > >? ? ? >? ? ? > method inlining and automatic vectorization. > >? ? ? >? ? ? > > >? ? ? >? ? ? > Our application aggregates arrays intensively using a method named > >? ? ? >? ? ? > ArrayFloatToArrayFloatVectorBinding.plus() with the following code: > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? for (int i = 0; i < srcLen; ++i) { > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? ? ? ? dstArray[i] += srcArray[i]; > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? } > >? ? ? >? ? ? > > >? ? ? >? ? ? > When we microbenchmark this method we observe fast performance close to the > >? ? ? >? ? ? > practical memory bandwidth and when we print the assembly code we observe > >? ? ? >? ? ? > loop unrolling and automatic vectorization with SIMD instructions. > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600abf0: vmovdqu 0x10(%r14,%r13,4),%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600abf7: vaddps 0x10(%rcx,%r13,4),%ymm0,%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600abfe: vmovdqu %ymm0,0x10(%r14,%r13,4) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac05: movslq %r13d,%r11 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac08: vmovdqu 0x30(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac0f: vaddps 0x30(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac16: vmovdqu %ymm0,0x30(%r14,%r11,4) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac1d: vmovdqu 0x50(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac24: vaddps 0x50(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac2b: vmovdqu %ymm0,0x50(%r14,%r11,4) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac32: vmovdqu 0x70(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac39: vaddps 0x70(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac40: vmovdqu %ymm0,0x70(%r14,%r11,4) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac47: vmovdqu 0x90(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac51: vaddps 0x90(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac5b: vmovdqu %ymm0,0x90(%r14,%r11,4) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac65: vmovdqu 0xb0(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac6f: vaddps 0xb0(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac79: vmovdqu %ymm0,0xb0(%r14,%r11,4) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac83: vmovdqu 0xd0(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac8d: vaddps 0xd0(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600ac97: vmovdqu %ymm0,0xd0(%r14,%r11,4) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600aca1: vmovdqu 0xf0(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600acab: vaddps 0xf0(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600acb5: vmovdqu %ymm0,0xf0(%r14,%r11,4)? ;*fastore > >? ? ? >? ? ? > {reexecute=0 rethrow=0 return_oop=0} > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > >? ? ? >? ? ? > (line 41) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600acbf: add? ? $0x40,%r13d? ? ? ? ;*iinc {reexecute=0 > >? ? ? >? ? ? > rethrow=0 return_oop=0} > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > >? ? ? >? ? ? > (line 40) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600acc3: cmp? ? %eax,%r13d > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef4600acc6: jl? ? ?0x000001ef4600abf0? ;*goto {reexecute=0 > >? ? ? >? ? ? > rethrow=0 return_oop=0} > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > >? ? ? >? ? ? > (line 40) > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > In the real application, this method is actually inlined in a higher level > >? ? ? >? ? ? > method named AVector.plus(). Unfortunately, the inlined version of the > >? ? ? >? ? ? > aggregation code is not vectorized anymore: > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef460180a0: cmp? ? %ebx,%r11d > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef460180a3: jae? ? 0x000001ef460180e6 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef460180a5: vmovss 0x10(%r8,%r11,4),%xmm1? ;*faload {reexecute=0 > >? ? ? >? ? ? > rethrow=0 return_oop=0} > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 > >? ? ? >? ? ? > (line 41) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef460180ac: cmp? ? %ecx,%r11d > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef460180af: jae? ? 0x000001ef46018104 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef460180b1: vaddss 0x10(%r9,%r11,4),%xmm1,%xmm1 > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef460180b8: vmovss %xmm1,0x10(%r8,%r11,4)? ;*fastore {reexecute=0 > >? ? ? >? ? ? > rethrow=0 return_oop=0} > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > >? ? ? >? ? ? > (line 41) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef460180bf: inc? ? %r11d? ? ? ? ? ? ? ;*iinc {reexecute=0 > >? ? ? >? ? ? > rethrow=0 return_oop=0} > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > >? ? ? >? ? ? > (line 40) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef460180c2: cmp? ? %r10d,%r11d > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? 0x000001ef460180c5: jl? ? ?0x000001ef460180a0? ;*goto {reexecute=0 > >? ? ? >? ? ? > rethrow=0 return_oop=0} > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > >? ? ? >? ? ? > (line 40) > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > This causes a significant performance drop, compared to a run where we > >? ? ? >? ? ? > explicitly disable the inlining and observe automatically vectorized code > >? ? ? >? ? ? > again ( > >? ? ? >? ? ? > -XX:CompileCommand=dontinline,com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding.plus > >? ? ? >? ? ? > ). > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > How do you guys explain that behavior of the JIT compiler? Is this a known > >? ? ? >? ? ? > and tracked issue, could it be fixed in the JVM? Can we do something in the > >? ? ? >? ? ? > java code to prevent this from happening? > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > Best regards, > >? ? ? >? ? ? > > >? ? ? >? ? ? > Nicolas Heutte > >? ? ? >? ? ? > > >? ? ? > > > > From kvn at openjdk.java.net Wed Feb 17 20:41:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 17 Feb 2021 20:41:39 GMT Subject: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v3] In-Reply-To: References: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> Message-ID: On Wed, 17 Feb 2021 18:54:27 GMT, Marcus G K Williams wrote: > > Please, add a test which verifies result of generated code for different ranges of mask (especially corner cases). See `compiler//codegen/BMI1.java` > > There are also example of verification of generated assembler in `compiler//intrinsics/bmi/` > > I'm taking a look at a test to verify generated code like the BMI1 test you suggest above. > > There is a current test that verifies operation indirectly (both long to int and int to long): test/hotspot/jtreg/compiler/c2/TestSkipLongToIntCast.java It does not cover your case which have `AND` operation. But I am fine if you add your case to it (or to other test, like BMI1.java) instead of creating new test file. > > Do we need both? A number of similar patches have been merged without, https://hg.openjdk.java.net/jdk/jdk/rev/d0f55423e913 for example. Missing tests in previous changes does not mean we should never create one. ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 From kvn at openjdk.java.net Wed Feb 17 20:58:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 17 Feb 2021 20:58:39 GMT Subject: RFR: 8261912: Code IfNode::fold_compares_helper more defensively In-Reply-To: References: Message-ID: <7HF5Z1YIB9wuETTkYCSUT390U6dmwodQWGM7_ieT8Jw=.c1aab00e-970d-4c26-8657-44785b3434b4@github.com> On Wed, 17 Feb 2021 16:11:39 GMT, Aleksey Shipilev wrote: > As [JDK-8261914](https://bugs.openjdk.java.net/browse/JDK-8261914) indicates, there are cases that break the internal asserts in `IfNode::fold_compares_helper`, code added by JDK-8073480 in JDK 9. Unfortunately, release builds would happily miscompile when that happens. It would be better to code `IfNode::fold_compares_helper` more defensively, so it bails when asserts are violated. This implicitly works around the bug in JDK-8261914. The goal for this limited workaround is to be trivially backportable in order to quickly unbreak 11u, 16u and 17. > > The alternative is, instead of the early returns is to do: > > lo = NULL; > hi = NULL; > > ...and then wait for for the method epilog to handle it. I have no preference to either style, as the blocks this patch affects already has some early returns, and `lo/hi = NULL` are also used. > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_64 fastdebug `tier2` > - [x] Failing JRuby reproducer from JDK-8261914, now passing in release mode with hundreds of iterations I am fine with this defensive change. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2610 From github.com+168222+mgkwill at openjdk.java.net Wed Feb 17 21:07:39 2021 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Wed, 17 Feb 2021 21:07:39 GMT Subject: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v3] In-Reply-To: References: <9PGauj5SuzWCyX7aTFHTUCEsK5kVH2ApKAurSK3o_To=.f20e9936-352f-47f0-a561-98dcdb7c842e@github.com> Message-ID: On Wed, 17 Feb 2021 20:38:45 GMT, Vladimir Kozlov wrote: > > > Please, add a test which verifies result of generated code for different ranges of mask (especially corner cases). See `compiler//codegen/BMI1.java` > > > There are also example of verification of generated assembler in `compiler//intrinsics/bmi/` > > > > > > I'm taking a look at a test to verify generated code like the BMI1 test you suggest above. > > There is a current test that verifies operation indirectly (both long to int and int to long): test/hotspot/jtreg/compiler/c2/TestSkipLongToIntCast.java > > It does not cover your case which have `AND` operation. > But I am fine if you add your case to it (or to other test, like BMI1.java) instead of creating new test file. > > > Do we need both? A number of similar patches have been merged without, https://hg.openjdk.java.net/jdk/jdk/rev/d0f55423e913 for example. > > Missing tests in previous changes does not mean we should never create one. Thanks @vnkozlov. In the process of testing both a BMI2.java and compiler/intrinsics/bmi/verifycode/BzhiTest.java addition I have written to test this change. ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 From github.com+10482586+therealeliu at openjdk.java.net Thu Feb 18 02:02:39 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Thu, 18 Feb 2021 02:02:39 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v3] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:13:41 GMT, Andrew Haley wrote: > OK. For what it's worth, I doubt that this will be suitable for backporting to 8u or 11u. Thanks for your review, I'd like to backport this. BTW, Is there anyone could review those trivial shared code? ------------- PR: https://git.openjdk.java.net/jdk/pull/1858 From thartmann at openjdk.java.net Thu Feb 18 07:19:41 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 18 Feb 2021 07:19:41 GMT Subject: RFR: 8261912: Code IfNode::fold_compares_helper more defensively In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:11:39 GMT, Aleksey Shipilev wrote: > As [JDK-8261914](https://bugs.openjdk.java.net/browse/JDK-8261914) indicates, there are cases that break the internal asserts in `IfNode::fold_compares_helper`, code added by JDK-8073480 in JDK 9. Unfortunately, release builds would happily miscompile when that happens. It would be better to code `IfNode::fold_compares_helper` more defensively, so it bails when asserts are violated. This implicitly works around the bug in JDK-8261914. The goal for this limited workaround is to be trivially backportable in order to quickly unbreak 11u, 16u and 17. > > The alternative is, instead of the early returns is to do: > > lo = NULL; > hi = NULL; > > ...and then wait for for the method epilog to handle it. I have no preference to either style, as the blocks this patch affects already has some early returns, and `lo/hi = NULL` are also used. > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_64 fastdebug `tier2` > - [x] Failing JRuby reproducer from JDK-8261914, now passing in release mode with hundreds of iterations Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2610 From dongbo at openjdk.java.net Thu Feb 18 07:57:25 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Thu, 18 Feb 2021 07:57:25 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v6] In-Reply-To: References: Message-ID: <45gEz_9Vli1Mvby8SbtMsoEx65KL11RQnI-qKy6zfgo=.42933345-0086-466b-ae38-3add53fc1816@github.com> > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - add tests in aarch64-asmtest.py - fix windows operator precedence error and cleanup testcase - Merge branch 'master' into aarch64_vector_api_shift - fix windows build failure - generate add if shift == 0 for accumulation and fix some test code - back out AD modifications and handle zero shift in assembler ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/d75ee99e..d746f209 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=04-05 Stats: 22519 lines in 709 files changed: 13758 ins; 4768 del; 3993 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Thu Feb 18 08:10:40 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Thu, 18 Feb 2021 08:10:40 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2] In-Reply-To: <0wUxJ4QUIzC-Hg4qSPtf8nFP0ov9J69nA3gjaoEJcWY=.ede23adb-4010-460d-8ac4-d560ace8ffc0@github.com> References: <_ach7OekIqkqmFRW3JqA5h4Q_HQUbRni0vkFzx5q3MA=.536a9faa-98c9-4dd9-9798-dcc794e23cd0@github.com> <0wUxJ4QUIzC-Hg4qSPtf8nFP0ov9J69nA3gjaoEJcWY=.ede23adb-4010-460d-8ac4-d560ace8ffc0@github.com> Message-ID: On Wed, 10 Feb 2021 02:59:24 GMT, Dong Bo wrote: >> src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 2057: >> >>> 2055: as_FloatRegister($src$$reg), as_FloatRegister($src$$reg)); >>> 2056: } else {ifelse($4, B,` >>> 2057: if (sh >= 8) sh = 7; >> >> I think it would be possible to move some of this logic from the AD file into MacroAssembler, with macros to generate the appropriate instruction based on their arguments. This might be cleaner: the logic here is very hard to follow. > > I backed out the modifications of `aarch64_neon.ad` and `aarch64_neon_ad.m4`. > The `shift == 0` case is handled by the assembler now. Verified with the regression tests. > I think it would be possible to move some of this logic from the AD file into MacroAssembler, with macros to generate the appropriate instruction based on their arguments. This might be cleaner: the logic here is very hard to follow. Hi, I moved the logic to the assembler. The assembler will generate different instructions based on the value of `shift`. If `shift == 0` and need not to accumulte, generated a `mov`. If `shift == 0` and need to accumulte, generated an `add`. Also added tests in `aarch64-asmtest.py` to verify the assembler modifications. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From thartmann at openjdk.java.net Thu Feb 18 08:12:40 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 18 Feb 2021 08:12:40 GMT Subject: RFR: 8260653: Unreachable nodes keep speculative types alive [v3] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 18:16:02 GMT, Nils Eliasson wrote: >> The RunThese test fails because after first igvn.optimize() (directly after parsing) there are unreachable nodes cycles left. Later when remove_speculative_types() is called - only reachable nodes will have their speculative type removed. At the end of remove_speculative_types() there is an assert that all speculative types have been removed that will fail. >> >> This problem will not cause crashes in production - it is only a sanity test. >> >> I suggest adding a call to PhaseRemoveUseless before remove_speculative_types. This will cost a few extra cycles but it is the only way we can guarantee that no unreachable nodes are left. >> >> When debugging this I experimented with adding a call to verify_graph_edges to check for dead code at the same spot. This triggers failures in a lot of test. The conclusion is that it is very common that we have dead node cycles - but they very rarely keep speculative types alive. >> >> A big thank you to Dean Long how created the reproducer for this bug. >> >> Please review. > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Remove outcnt check Otherwise looks good. src/hotspot/share/opto/phaseX.cpp line 338: > 336: for (uint i = 0; i < max; ++i) { > 337: Node *n = at(i); > 338: if(n != NULL && Missing whitespace between `if` and `(` ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2606 From neliasso at openjdk.java.net Thu Feb 18 08:36:07 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 18 Feb 2021 08:36:07 GMT Subject: RFR: 8260653: Unreachable nodes keep speculative types alive [v4] In-Reply-To: References: Message-ID: > The RunThese test fails because after first igvn.optimize() (directly after parsing) there are unreachable nodes cycles left. Later when remove_speculative_types() is called - only reachable nodes will have their speculative type removed. At the end of remove_speculative_types() there is an assert that all speculative types have been removed that will fail. > > This problem will not cause crashes in production - it is only a sanity test. > > I suggest adding a call to PhaseRemoveUseless before remove_speculative_types. This will cost a few extra cycles but it is the only way we can guarantee that no unreachable nodes are left. > > When debugging this I experimented with adding a call to verify_graph_edges to check for dead code at the same spot. This triggers failures in a lot of test. The conclusion is that it is very common that we have dead node cycles - but they very rarely keep speculative types alive. > > A big thank you to Dean Long how created the reproducer for this bug. > > Please review. Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: Add missing ws ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2606/files - new: https://git.openjdk.java.net/jdk/pull/2606/files/ac23dad8..ae048cbc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2606&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2606&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2606.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2606/head:pull/2606 PR: https://git.openjdk.java.net/jdk/pull/2606 From aph at openjdk.java.net Thu Feb 18 09:37:43 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 09:37:43 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v6] In-Reply-To: <45gEz_9Vli1Mvby8SbtMsoEx65KL11RQnI-qKy6zfgo=.42933345-0086-466b-ae38-3add53fc1816@github.com> References: <45gEz_9Vli1Mvby8SbtMsoEx65KL11RQnI-qKy6zfgo=.42933345-0086-466b-ae38-3add53fc1816@github.com> Message-ID: On Thu, 18 Feb 2021 07:57:25 GMT, Dong Bo wrote: >> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, >> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: >> /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ >> public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); >> >> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, >> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. >> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. >> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); >> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); >> >> The legal right shift amount should be in the range 1 to the element width in bits on aarch64: >> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en >> >> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. >> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. > > Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - add tests in aarch64-asmtest.py > - fix windows operator precedence error and cleanup testcase > - Merge branch 'master' into aarch64_vector_api_shift > - fix windows build failure > - generate add if shift == 0 for accumulation and fix some test code > - back out AD modifications and handle zero shift in assembler src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2709: > 2707: f(encodedShift, 22, 16); f(opc2, 15, 10), rf(Vn, 5), rf(Vd, 0); \ > 2708: } \ > 2709: } Is this correct, according to the definition in the Architecture Reference Manual? It doesn't look like it to me. Assembler methods should generate bit patterns exactly as defined in the Manual. This logic should be in a MacroAssembler method. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From neliasso at openjdk.java.net Thu Feb 18 10:26:40 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 18 Feb 2021 10:26:40 GMT Subject: RFR: 8260653: Unreachable nodes keep speculative types alive [v3] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 08:10:08 GMT, Tobias Hartmann wrote: >> Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove outcnt check > > Otherwise looks good. Thanks for the reviews Vladimir and Tobias! ------------- PR: https://git.openjdk.java.net/jdk/pull/2606 From neliasso at openjdk.java.net Thu Feb 18 10:26:41 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 18 Feb 2021 10:26:41 GMT Subject: Integrated: 8260653: Unreachable nodes keep speculative types alive In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 10:27:08 GMT, Nils Eliasson wrote: > The RunThese test fails because after first igvn.optimize() (directly after parsing) there are unreachable nodes cycles left. Later when remove_speculative_types() is called - only reachable nodes will have their speculative type removed. At the end of remove_speculative_types() there is an assert that all speculative types have been removed that will fail. > > This problem will not cause crashes in production - it is only a sanity test. > > I suggest adding a call to PhaseRemoveUseless before remove_speculative_types. This will cost a few extra cycles but it is the only way we can guarantee that no unreachable nodes are left. > > When debugging this I experimented with adding a call to verify_graph_edges to check for dead code at the same spot. This triggers failures in a lot of test. The conclusion is that it is very common that we have dead node cycles - but they very rarely keep speculative types alive. > > A big thank you to Dean Long how created the reproducer for this bug. > > Please review. This pull request has now been integrated. Changeset: 3a21e1df Author: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/3a21e1df Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod 8260653: Unreachable nodes keep speculative types alive Reviewed-by: vlivanov, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2606 From chagedorn at openjdk.java.net Thu Feb 18 10:35:38 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 18 Feb 2021 10:35:38 GMT Subject: RFR: 8259984: IGV: Crash when drawing control flow before GCM In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 12:23:55 GMT, Roberto Casta?eda Lozano wrote: > The block formation algorithm applied by IGV to schedule graphs without associated CFG information traverses the graph backwards. This makes it difficult to deal with block projection nodes, leading in some cases to double addition of block nodes and block edges, and ultimately causing assertion failures. > > This fix replaces the backward traversal by a forward traversal that relies on node category information (introduced in [8261336](https://bugs.openjdk.java.net/browse/JDK-8261336)) to identify control successors. The forward traversal is arguably simpler and, besides avoiding the reported assertion failure, has two advantages: > > - it places block projection nodes in the same block as their predecessors, and > - it numbers basic blocks more naturally. > > The following screenshots illustrate the improvements (before the fix to the left, after the fix to the right): > > ![cfgs-before-after](https://user-images.githubusercontent.com/8792647/108204708-6dcf5e00-7124-11eb-956c-fb7f84229b50.png) > > Tested automatically on tens of thousands of graphs by running `java -Xcomp -XX:-TieredCompilation -XX:PrintIdealGraphLevel=4 ...` on an instrumented version of IGV that schedules graphs eagerly. Checked manually that the CFGs of a few selected graphs (included the reported one) are well-formed. Also checked that the overall IGV graph scheduling time is not affected by the changes. > > Thanks to Christian Hagedorn for checking the fix independently. Looks good to me! I played around with it in the IGV. I regularly got some assertion failure messages before - now they are gone now ?? src/utils/IdealGraphVisualizer/ServerCompiler/src/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java line 634: > 632: } > 633: } > 634: Maybe you could extract the following code into 2 separate methods. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2607 From neliasso at openjdk.java.net Thu Feb 18 10:46:39 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 18 Feb 2021 10:46:39 GMT Subject: RFR: 8259984: IGV: Crash when drawing control flow before GCM In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 12:23:55 GMT, Roberto Casta?eda Lozano wrote: > The block formation algorithm applied by IGV to schedule graphs without associated CFG information traverses the graph backwards. This makes it difficult to deal with block projection nodes, leading in some cases to double addition of block nodes and block edges, and ultimately causing assertion failures. > > This fix replaces the backward traversal by a forward traversal that relies on node category information (introduced in [8261336](https://bugs.openjdk.java.net/browse/JDK-8261336)) to identify control successors. The forward traversal is arguably simpler and, besides avoiding the reported assertion failure, has two advantages: > > - it places block projection nodes in the same block as their predecessors, and > - it numbers basic blocks more naturally. > > The following screenshots illustrate the improvements (before the fix to the left, after the fix to the right): > > ![cfgs-before-after](https://user-images.githubusercontent.com/8792647/108204708-6dcf5e00-7124-11eb-956c-fb7f84229b50.png) > > Tested automatically on tens of thousands of graphs by running `java -Xcomp -XX:-TieredCompilation -XX:PrintIdealGraphLevel=4 ...` on an instrumented version of IGV that schedules graphs eagerly. Checked manually that the CFGs of a few selected graphs (included the reported one) are well-formed. Also checked that the overall IGV graph scheduling time is not affected by the changes. > > Thanks to Christian Hagedorn for checking the fix independently. Excellent! ------------- PR: https://git.openjdk.java.net/jdk/pull/2607 From shade at openjdk.java.net Thu Feb 18 10:58:45 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Feb 2021 10:58:45 GMT Subject: RFR: 8261912: Code IfNode::fold_compares_helper more defensively In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 07:16:56 GMT, Tobias Hartmann wrote: >> As [JDK-8261914](https://bugs.openjdk.java.net/browse/JDK-8261914) indicates, there are cases that break the internal asserts in `IfNode::fold_compares_helper`, code added by JDK-8073480 in JDK 9. Unfortunately, release builds would happily miscompile when that happens. It would be better to code `IfNode::fold_compares_helper` more defensively, so it bails when asserts are violated. This implicitly works around the bug in JDK-8261914. The goal for this limited workaround is to be trivially backportable in order to quickly unbreak 11u, 16u and 17. >> >> The alternative is, instead of the early returns is to do: >> >> lo = NULL; >> hi = NULL; >> >> ...and then wait for for the method epilog to handle it. I have no preference to either style, as the blocks this patch affects already has some early returns, and `lo/hi = NULL` are also used. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux x86_64 fastdebug `tier2` >> - [x] Failing JRuby reproducer from JDK-8261914, now passing in release mode with hundreds of iterations > > Looks good. The remaining test failure should be resolved with #2614. I'll integrate once this PR is 24 hours old. ------------- PR: https://git.openjdk.java.net/jdk/pull/2610 From chegar at openjdk.java.net Thu Feb 18 11:54:40 2021 From: chegar at openjdk.java.net (Chris Hegarty) Date: Thu, 18 Feb 2021 11:54:40 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible [v2] In-Reply-To: References: Message-ID: <8erqBs2u14oRGdt-KmCGy9IGLZnv5k1tqwm57BSnjBQ=.4e8133b8-f287-43a0-94d3-da136336e56b@github.com> On Wed, 17 Feb 2021 16:38:03 GMT, ?????? ??????? wrote: >> Non-static classes hold a link to their parent classes, which in many cases can be avoided. > > ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: > > 8261880: Remove static from declarations of Holder nested classes The changes in java/net look ok to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/2589 From rcastanedalo at openjdk.java.net Thu Feb 18 12:06:53 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 18 Feb 2021 12:06:53 GMT Subject: RFR: 8259984: IGV: Crash when drawing control flow before GCM [v2] In-Reply-To: References: Message-ID: > The block formation algorithm applied by IGV to schedule graphs without associated CFG information traverses the graph backwards. This makes it difficult to deal with block projection nodes, leading in some cases to double addition of block nodes and block edges, and ultimately causing assertion failures. > > This fix replaces the backward traversal by a forward traversal that relies on node category information (introduced in [8261336](https://bugs.openjdk.java.net/browse/JDK-8261336)) to identify control successors. The forward traversal is arguably simpler and, besides avoiding the reported assertion failure, has two advantages: > > - it places block projection nodes in the same block as their predecessors, and > - it numbers basic blocks more naturally. > > The following screenshots illustrate the improvements (before the fix to the left, after the fix to the right): > > ![cfgs-before-after](https://user-images.githubusercontent.com/8792647/108204708-6dcf5e00-7124-11eb-956c-fb7f84229b50.png) > > Tested automatically on tens of thousands of graphs by running `java -Xcomp -XX:-TieredCompilation -XX:PrintIdealGraphLevel=4 ...` on an instrumented version of IGV that schedules graphs eagerly. Checked manually that the CFGs of a few selected graphs (included the reported one) are well-formed. Also checked that the overall IGV graph scheduling time is not affected by the changes. > > Thanks to Christian Hagedorn for checking the fix independently. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Extract new logic out of buildUpGraph() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2607/files - new: https://git.openjdk.java.net/jdk/pull/2607/files/69987721..9f6444ba Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2607&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2607&range=00-01 Stats: 12 lines in 1 file changed: 6 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2607.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2607/head:pull/2607 PR: https://git.openjdk.java.net/jdk/pull/2607 From rcastanedalo at openjdk.java.net Thu Feb 18 12:06:53 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 18 Feb 2021 12:06:53 GMT Subject: RFR: 8259984: IGV: Crash when drawing control flow before GCM [v2] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 10:43:27 GMT, Nils Eliasson wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Extract new logic out of buildUpGraph() > > Excellent! Thanks Christian and Nils for reviewing! Christian, I refactored the code as per your suggestion, please have a look again. ------------- PR: https://git.openjdk.java.net/jdk/pull/2607 From rcastanedalo at openjdk.java.net Thu Feb 18 12:06:54 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 18 Feb 2021 12:06:54 GMT Subject: RFR: 8259984: IGV: Crash when drawing control flow before GCM [v2] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 10:29:28 GMT, Christian Hagedorn wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Extract new logic out of buildUpGraph() > > src/utils/IdealGraphVisualizer/ServerCompiler/src/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java line 634: > >> 632: } >> 633: } >> 634: > > Maybe you could extract the following code into 2 separate methods. Thanks, done! ------------- PR: https://git.openjdk.java.net/jdk/pull/2607 From chagedorn at openjdk.java.net Thu Feb 18 12:23:43 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 18 Feb 2021 12:23:43 GMT Subject: RFR: 8259984: IGV: Crash when drawing control flow before GCM [v2] In-Reply-To: References: Message-ID: <-Qd5a0SApy6Cp07xQK_xyYmpsGEGxZ1vdAzaFQvmVDc=.df55812b-e6f4-4559-be03-9d4e06ed3f6e@github.com> On Thu, 18 Feb 2021 12:06:53 GMT, Roberto Casta?eda Lozano wrote: >> The block formation algorithm applied by IGV to schedule graphs without associated CFG information traverses the graph backwards. This makes it difficult to deal with block projection nodes, leading in some cases to double addition of block nodes and block edges, and ultimately causing assertion failures. >> >> This fix replaces the backward traversal by a forward traversal that relies on node category information (introduced in [8261336](https://bugs.openjdk.java.net/browse/JDK-8261336)) to identify control successors. The forward traversal is arguably simpler and, besides avoiding the reported assertion failure, has two advantages: >> >> - it places block projection nodes in the same block as their predecessors, and >> - it numbers basic blocks more naturally. >> >> The following screenshots illustrate the improvements (before the fix to the left, after the fix to the right): >> >> ![cfgs-before-after](https://user-images.githubusercontent.com/8792647/108204708-6dcf5e00-7124-11eb-956c-fb7f84229b50.png) >> >> Tested automatically on tens of thousands of graphs by running `java -Xcomp -XX:-TieredCompilation -XX:PrintIdealGraphLevel=4 ...` on an instrumented version of IGV that schedules graphs eagerly. Checked manually that the CFGs of a few selected graphs (included the reported one) are well-formed. Also checked that the overall IGV graph scheduling time is not affected by the changes. >> >> Thanks to Christian Hagedorn for checking the fix independently. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Extract new logic out of buildUpGraph() Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2607 From neliasso at openjdk.java.net Thu Feb 18 12:41:39 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 18 Feb 2021 12:41:39 GMT Subject: RFR: 8259984: IGV: Crash when drawing control flow before GCM [v2] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 12:06:53 GMT, Roberto Casta?eda Lozano wrote: >> The block formation algorithm applied by IGV to schedule graphs without associated CFG information traverses the graph backwards. This makes it difficult to deal with block projection nodes, leading in some cases to double addition of block nodes and block edges, and ultimately causing assertion failures. >> >> This fix replaces the backward traversal by a forward traversal that relies on node category information (introduced in [8261336](https://bugs.openjdk.java.net/browse/JDK-8261336)) to identify control successors. The forward traversal is arguably simpler and, besides avoiding the reported assertion failure, has two advantages: >> >> - it places block projection nodes in the same block as their predecessors, and >> - it numbers basic blocks more naturally. >> >> The following screenshots illustrate the improvements (before the fix to the left, after the fix to the right): >> >> ![cfgs-before-after](https://user-images.githubusercontent.com/8792647/108204708-6dcf5e00-7124-11eb-956c-fb7f84229b50.png) >> >> Tested automatically on tens of thousands of graphs by running `java -Xcomp -XX:-TieredCompilation -XX:PrintIdealGraphLevel=4 ...` on an instrumented version of IGV that schedules graphs eagerly. Checked manually that the CFGs of a few selected graphs (included the reported one) are well-formed. Also checked that the overall IGV graph scheduling time is not affected by the changes. >> >> Thanks to Christian Hagedorn for checking the fix independently. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Extract new logic out of buildUpGraph() Marked as reviewed by neliasso (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2607 From shade at openjdk.java.net Thu Feb 18 15:54:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Feb 2021 15:54:40 GMT Subject: Integrated: 8261912: Code IfNode::fold_compares_helper more defensively In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:11:39 GMT, Aleksey Shipilev wrote: > As [JDK-8261914](https://bugs.openjdk.java.net/browse/JDK-8261914) indicates, there are cases that break the internal asserts in `IfNode::fold_compares_helper`, code added by JDK-8073480 in JDK 9. Unfortunately, release builds would happily miscompile when that happens. It would be better to code `IfNode::fold_compares_helper` more defensively, so it bails when asserts are violated. This implicitly works around the bug in JDK-8261914. The goal for this limited workaround is to be trivially backportable in order to quickly unbreak 11u, 16u and 17. > > The alternative is, instead of the early returns is to do: > > lo = NULL; > hi = NULL; > > ...and then wait for for the method epilog to handle it. I have no preference to either style, as the blocks this patch affects already has some early returns, and `lo/hi = NULL` are also used. > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_64 fastdebug `tier2` > - [x] Failing JRuby reproducer from JDK-8261914, now passing in release mode with hundreds of iterations This pull request has now been integrated. Changeset: e9f3aab7 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/e9f3aab7 Stats: 23 lines in 1 file changed: 15 ins; 4 del; 4 mod 8261912: Code IfNode::fold_compares_helper more defensively Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2610 From vlivanov at openjdk.java.net Thu Feb 18 17:25:53 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 18 Feb 2021 17:25:53 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class Message-ID: Simplify `ClassHierarchyWalker::find_witness_anywhere()` which iterates over class hierarchy under context class searching for witnesses. Current implementation traverses the hierarchy in a breadth-first manner and keeps a stack-allocated array to keep a worklist. But all the subclasses are already part of a singly linked list formed by `Klass::subklass()`/`next_sibling()`/`superklass()`. Proposed refactoring gets rid of the explicit worklist and switches to the traversal over the linked list (encapsulated into `ClassHierarchyIterator`). It performs depth-first pre-order hierarchy traversal. (There are some other minor refactorings applied in `ClassHierarchyWalker` along the way.) Testing: - [x] hs-tier1 - hs-tier8 - [x] additional verification that CHA decisions aren't affected ------------- Commit messages: - Refactor ClassHierarchyIterator - Revert verification - Verification - Cleanups in ClassHierarchyWalker::is_witness() - Migrate ClassHierarchyWalker::find_witness_in to ClassHierarchyIterator - ClassHierarchyIterator Changes: https://git.openjdk.java.net/jdk/pull/2630/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2630&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261954 Stats: 169 lines in 2 files changed: 64 ins; 69 del; 36 mod Patch: https://git.openjdk.java.net/jdk/pull/2630.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2630/head:pull/2630 PR: https://git.openjdk.java.net/jdk/pull/2630 From kvn at openjdk.java.net Thu Feb 18 17:58:40 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 18 Feb 2021 17:58:40 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 17:05:08 GMT, Vladimir Ivanov wrote: > Simplify `ClassHierarchyWalker::find_witness_anywhere()` which iterates over class hierarchy under context class searching for witnesses. > > Current implementation traverses the hierarchy in a breadth-first manner and keeps a stack-allocated array to keep a worklist. > But all the subclasses are already part of a singly linked list formed by `Klass::subklass()`/`next_sibling()`/`superklass()`. > > Proposed refactoring gets rid of the explicit worklist and switches to the traversal over the linked list (encapsulated into `ClassHierarchyIterator`). It performs depth-first pre-order hierarchy traversal. > > (There are some other minor refactorings applied in `ClassHierarchyWalker` along the way.) > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] additional verification that CHA decisions aren't affected Except `do_counts` changes seem fine to me. src/hotspot/share/code/dependencies.cpp line 1329: > 1327: assert_locked_or_safepoint(Compile_lock); > 1328: > 1329: bool do_counts = count_find_witness_calls(); Does count_find_witness_calls() have side effects and required to be called here in product? If not, `do_counts` is used only for not_product code, so consider enclose it into NOT_PRODUCT() here and where it is used. ------------- PR: https://git.openjdk.java.net/jdk/pull/2630 From kvn at openjdk.java.net Thu Feb 18 19:17:44 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 18 Feb 2021 19:17:44 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 02:37:35 GMT, Sandhya Viswanathan wrote: > The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 > > The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. > > Before: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms > > After: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms Please, add a test which verifies correctness of results when this code is used. If we don't have it already. src/hotspot/cpu/x86/x86.ad line 7506: > 7504: instruct rearrangeB_avx(legVec dst, legVec src, vec shuffle, legVec vtmp1, legVec vtmp2, rRegP scratch) %{ > 7505: predicate(vector_element_basic_type(n) == T_BYTE && > 7506: vector_length(n) == 32 && !VM_Version::supports_avx512_vbmi()); Predicate matches bail-out condition in match_rule_supported_vector(). Does it mean this code never used before? So you are implementing it now. Right? src/hotspot/cpu/x86/x86.ad line 7550: > 7548: // only byte shuffle instruction available on these platforms > 7549: int vlen_in_bytes = vector_length_in_bytes(this); > 7550: if (UseAVX == 0) { This code will not be executed with vector length 16 because match_rule_supported_vector() bailout with (size_in_bits == 256 && UseAVX < 2). ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2520 From vlivanov at openjdk.java.net Thu Feb 18 20:45:38 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 18 Feb 2021 20:45:38 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 17:53:41 GMT, Vladimir Kozlov wrote: >> Simplify `ClassHierarchyWalker::find_witness_anywhere()` which iterates over class hierarchy under context class searching for witnesses. >> >> Current implementation traverses the hierarchy in a breadth-first manner and keeps a stack-allocated array to keep a worklist. >> But all the subclasses are already part of a singly linked list formed by `Klass::subklass()`/`next_sibling()`/`superklass()`. >> >> Proposed refactoring gets rid of the explicit worklist and switches to the traversal over the linked list (encapsulated into `ClassHierarchyIterator`). It performs depth-first pre-order hierarchy traversal. >> >> (There are some other minor refactorings applied in `ClassHierarchyWalker` along the way.) >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] additional verification that CHA decisions aren't affected > > src/hotspot/share/code/dependencies.cpp line 1329: > >> 1327: assert_locked_or_safepoint(Compile_lock); >> 1328: >> 1329: bool do_counts = count_find_witness_calls(); > > Does count_find_witness_calls() have side effects and required to be called here in product? If not, > `do_counts` is used only for not_product code, so consider enclose it into NOT_PRODUCT() here and where it is used. `count_find_witness_calls()` is a no-op in product binaries: #ifndef PRODUCT ... static bool count_find_witness_calls() { ... #else #define count_find_witness_calls() (0) #endif //PRODUCT ------------- PR: https://git.openjdk.java.net/jdk/pull/2630 From kvn at openjdk.java.net Thu Feb 18 21:06:38 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 18 Feb 2021 21:06:38 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 17:05:08 GMT, Vladimir Ivanov wrote: > Simplify `ClassHierarchyWalker::find_witness_anywhere()` which iterates over class hierarchy under context class searching for witnesses. > > Current implementation traverses the hierarchy in a breadth-first manner and keeps a stack-allocated array to keep a worklist. > But all the subclasses are already part of a singly linked list formed by `Klass::subklass()`/`next_sibling()`/`superklass()`. > > Proposed refactoring gets rid of the explicit worklist and switches to the traversal over the linked list (encapsulated into `ClassHierarchyIterator`). It performs depth-first pre-order hierarchy traversal. > > (There are some other minor refactorings applied in `ClassHierarchyWalker` along the way.) > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] additional verification that CHA decisions aren't affected OK ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2630 From sviswanathan at openjdk.java.net Thu Feb 18 21:26:41 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 18 Feb 2021 21:26:41 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 19:14:37 GMT, Vladimir Kozlov wrote: >> The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 >> >> The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms > > Please, add a test which verifies correctness of results when this code is used. If we don't have it already. @vnkozlov thanks a lot for the review. The test for slice and unslice are already part of test/jdk/jdk/incubator/vector/Byte256VectorTests.java and Short256VectorTests.java. > src/hotspot/cpu/x86/x86.ad line 7550: > >> 7548: // only byte shuffle instruction available on these platforms >> 7549: int vlen_in_bytes = vector_length_in_bytes(this); >> 7550: if (UseAVX == 0) { > > This code will not be executed with vector length 16 because match_rule_supported_vector() bailout with (size_in_bits == 256 && UseAVX < 2). Yes you are right, but this code will execute for vector length 16 when UseAVX ==2. It will also execure for vector length 16 when UseAVX == 3 && !VM_Version::supports_avx512bw. > src/hotspot/cpu/x86/x86.ad line 7506: > >> 7504: instruct rearrangeB_avx(legVec dst, legVec src, vec shuffle, legVec vtmp1, legVec vtmp2, rRegP scratch) %{ >> 7505: predicate(vector_element_basic_type(n) == T_BYTE && >> 7506: vector_length(n) == 32 && !VM_Version::supports_avx512_vbmi()); > > Predicate matches bail-out condition in match_rule_supported_vector(). Does it mean this code never used before? > So you are implementing it now. Right? Yes, this rule was not used before and I am implementing it now. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From coleenp at openjdk.java.net Thu Feb 18 22:44:40 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 18 Feb 2021 22:44:40 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 17:05:08 GMT, Vladimir Ivanov wrote: > Simplify `ClassHierarchyWalker::find_witness_anywhere()` which iterates over class hierarchy under context class searching for witnesses. > > Current implementation traverses the hierarchy in a breadth-first manner and keeps a stack-allocated array to keep a worklist. > But all the subclasses are already part of a singly linked list formed by `Klass::subklass()`/`next_sibling()`/`superklass()`. > > Proposed refactoring gets rid of the explicit worklist and switches to the traversal over the linked list (encapsulated into `ClassHierarchyIterator`). It performs depth-first pre-order hierarchy traversal. > > (There are some other minor refactorings applied in `ClassHierarchyWalker` along the way.) > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] additional verification that CHA decisions aren't affected This seems like a nice change. Some questions. src/hotspot/share/oops/instanceKlass.hpp line 1486: > 1484: } > 1485: }; > 1486: The _subklass and _next_sibling fields and implementation are in Klass (klass.hpp/cpp) but I always wonder why they are not in InstanceKlass (instanceKlass.hpp/cpp). If this new class is in instanceKlass.hpp, these fields should be in the same. If you agree, we should file an RFE and I will move them. If the fields truly belong in klass.hpp, then this should be there too. ie. they should match. src/hotspot/share/code/dependencies.cpp line 1345: > 1343: } else if (nof_impls == 1) { // unique implementor > 1344: assert(context_type != context_type->implementor(), "not unique"); > 1345: context_type = InstanceKlass::cast(context_type->implementor()); There's no reason that implementor() should return Klass* rather than InstanceKlass* ? We can clean that up also later to reduce casts. (I thought I already tried to do this once). ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2630 From kvn at openjdk.java.net Thu Feb 18 23:24:43 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 18 Feb 2021 23:24:43 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 19:14:37 GMT, Vladimir Kozlov wrote: >> The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 >> >> The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms > > Please, add a test which verifies correctness of results when this code is used. If we don't have it already. > @vnkozlov thanks a lot for the review. > The test for slice and unslice are already part of test/jdk/jdk/incubator/vector/Byte256VectorTests.java and Short256VectorTests.java. Good. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From kvn at openjdk.java.net Thu Feb 18 23:34:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 18 Feb 2021 23:34:45 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 21:21:49 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/x86.ad line 7550: >> >>> 7548: // only byte shuffle instruction available on these platforms >>> 7549: int vlen_in_bytes = vector_length_in_bytes(this); >>> 7550: if (UseAVX == 0) { >> >> This code will not be executed with vector length 16 because match_rule_supported_vector() bailout with (size_in_bits == 256 && UseAVX < 2). > > Yes you are right, but this code will execute for vector length 16 when UseAVX ==2. > It will also execure for vector length 16 when UseAVX == 3 && > !VM_Version::supports_avx512bw. Next assert checks <= 16 when code is guarded by (UseAVX == 0). It is not (UseAVX ==2). Also } else { case is for UseAVX > 0 which includes AVX=1 but vpaddb() (avx3) is used there. Seems UseAVX checks wrong here. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From sviswanathan at openjdk.java.net Fri Feb 19 00:35:54 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 00:35:54 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v2] In-Reply-To: References: Message-ID: > The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 > > The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. > > Before: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms > > After: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: corrected assert ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2520/files - new: https://git.openjdk.java.net/jdk/pull/2520/files/77324374..55165cb5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2520.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2520/head:pull/2520 PR: https://git.openjdk.java.net/jdk/pull/2520 From sviswanathan at openjdk.java.net Fri Feb 19 01:23:04 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 01:23:04 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v3] In-Reply-To: References: Message-ID: <4TNVM4VqqsezSVCJndS1RpM9EvIfICm5mv1AbZhthgg=.3943a37b-cce4-415e-989f-b9696ad6c008@github.com> > The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 > > The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. > > Before: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms > > After: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: corrected assert ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2520/files - new: https://git.openjdk.java.net/jdk/pull/2520/files/55165cb5..fa13679a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2520.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2520/head:pull/2520 PR: https://git.openjdk.java.net/jdk/pull/2520 From sviswanathan at openjdk.java.net Fri Feb 19 01:30:41 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 01:30:41 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v3] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 23:31:28 GMT, Vladimir Kozlov wrote: >> Yes you are right, but this code will execute for vector length 16 when UseAVX ==2. >> It will also execure for vector length 16 when UseAVX == 3 && >> !VM_Version::supports_avx512bw. > > Next assert checks <= 16 when code is guarded by (UseAVX == 0). It is not (UseAVX ==2). > Also } else { case is for UseAVX > 0 which includes AVX=1 but vpaddb() (avx3) is used there. > Seems UseAVX checks wrong here. The assert checks for vlen_in_bytes <= 16 (128 bits) and so is a correct check for UseAVX=0. vpaddb is supported on AVX1/AVX2 as well. vpaddb is supported on AVX1 for up to 128 bit and on AVX2 for upto 256 bit and on AVX3 (512) for upto 512 bit vectors. I have tested this for UseAVX=0, UseAVX=1, UseAVX=2, UseAVX=3 platform. The check is for UseAVX as with any flavor of AVX, we can use less number of instructions to do this operation. This is because AVX allows destination to be separate from both the sources. Please let me know if I am missing something. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From kvn at openjdk.java.net Fri Feb 19 01:58:44 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 19 Feb 2021 01:58:44 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v3] In-Reply-To: <4TNVM4VqqsezSVCJndS1RpM9EvIfICm5mv1AbZhthgg=.3943a37b-cce4-415e-989f-b9696ad6c008@github.com> References: <4TNVM4VqqsezSVCJndS1RpM9EvIfICm5mv1AbZhthgg=.3943a37b-cce4-415e-989f-b9696ad6c008@github.com> Message-ID: On Fri, 19 Feb 2021 01:23:04 GMT, Sandhya Viswanathan wrote: >> The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 >> >> The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > corrected assert src/hotspot/cpu/x86/x86.ad line 1695: > 1693: if(vlen == 2) { > 1694: return false; // Implementation limitation due to how shuffle is loaded > 1695: } else if (size_in_bits == 256 && UseAVX < 2) { Should this be >= 256? ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From kvn at openjdk.java.net Fri Feb 19 02:07:40 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 19 Feb 2021 02:07:40 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v3] In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 01:27:57 GMT, Sandhya Viswanathan wrote: >> Next assert checks <= 16 when code is guarded by (UseAVX == 0). It is not (UseAVX ==2). >> Also } else { case is for UseAVX > 0 which includes AVX=1 but vpaddb() (avx3) is used there. >> Seems UseAVX checks wrong here. > > The assert checks for vlen_in_bytes <= 16 (128 bits) and so is a correct check for UseAVX=0. > vpaddb is supported on AVX1/AVX2 as well. > vpaddb is supported on AVX1 for up to 128 bit and > on AVX2 for upto 256 bit and > on AVX3 (512) for upto 512 bit vectors. > I have tested this for UseAVX=0, UseAVX=1, UseAVX=2, UseAVX=3 platform. > > The check is for UseAVX as with any flavor of AVX, we can use less number of instructions to do this operation. > This is because AVX allows destination to be separate from both the sources. > > Please let me know if I am missing something. My bad - I missed that size is in bytes in assert. The assert is correct, as you said. And `} else {` part works for AVX1 because of match_rule_supported_vector() bailout 256-bit case. May be add assert(UseAVX > 1 || vlen_in_bytes <= 16, ). I only have one question left - about check >= 256 in match_rule_supported_vector() ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From sviswanathan at openjdk.java.net Fri Feb 19 02:33:40 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 02:33:40 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v3] In-Reply-To: References: <4TNVM4VqqsezSVCJndS1RpM9EvIfICm5mv1AbZhthgg=.3943a37b-cce4-415e-989f-b9696ad6c008@github.com> Message-ID: On Fri, 19 Feb 2021 01:56:19 GMT, Vladimir Kozlov wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> corrected assert > > src/hotspot/cpu/x86/x86.ad line 1695: > >> 1693: if(vlen == 2) { >> 1694: return false; // Implementation limitation due to how shuffle is loaded >> 1695: } else if (size_in_bits == 256 && UseAVX < 2) { > > Should this be >= 256? The general >= 256 part is taken care of early on in match_rule_supported_vector as below: if (!vector_size_supported(bt, vlen)) { return false; } The only additional check that is being done here is for float and double 256 bit vectors that are supported on AVX=1 and will pass the vector_size_supported check. This is because the VectorLoadShuffle cannot be performed for 256 bit vectors on AVX1 platform as it needs "integer" 256 bit instructions which are only available on AVX2. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From dongbo at openjdk.java.net Fri Feb 19 03:13:12 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Fri, 19 Feb 2021 03:13:12 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v7] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: handle zero shift in macro assembler ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/d746f209..1aba5629 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=05-06 Stats: 530 lines in 4 files changed: 27 ins; 174 del; 329 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Fri Feb 19 03:17:42 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Fri, 19 Feb 2021 03:17:42 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v6] In-Reply-To: References: <45gEz_9Vli1Mvby8SbtMsoEx65KL11RQnI-qKy6zfgo=.42933345-0086-466b-ae38-3add53fc1816@github.com> Message-ID: On Thu, 18 Feb 2021 09:35:07 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2694: > >> 2692: assert((1 << ((T>>1)+3)) > shift, "Invalid Shift value"); \ >> 2693: if (shift == 0) { \ >> 2694: bool accumulate = ((opc2 & 0b100) != 0); \ > > Is this correct, according to the definition in the Architecture Reference Manual? It doesn't look like it to me. Assembler methods should generate bit patterns exactly as defined in the Manual. This logic should be in a MacroAssembler method. Hi, I moved the logic into MacroAssembler. The assert is kept to make sure that we would never pass a zero right shift to assemlber. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From sviswanathan at openjdk.java.net Fri Feb 19 03:20:59 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 03:20:59 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v4] In-Reply-To: References: Message-ID: <_0LW_Yx-ItDG7f-3kVjOTPa5BbcaN1FMRBbATLMwpmk=.da545e81-59a0-4d80-9f45-0714fe7bfc94@github.com> On Fri, 19 Feb 2021 02:05:02 GMT, Vladimir Kozlov wrote: >> The assert checks for vlen_in_bytes <= 16 (128 bits) and so is a correct check for UseAVX=0. >> vpaddb is supported on AVX1/AVX2 as well. >> vpaddb is supported on AVX1 for up to 128 bit and >> on AVX2 for upto 256 bit and >> on AVX3 (512) for upto 512 bit vectors. >> I have tested this for UseAVX=0, UseAVX=1, UseAVX=2, UseAVX=3 platform. >> >> The check is for UseAVX as with any flavor of AVX, we can use less number of instructions to do this operation. >> This is because AVX allows destination to be separate from both the sources. >> >> Please let me know if I am missing something. > > My bad - I missed that size is in bytes in assert. The assert is correct, as you said. > And `} else {` part works for AVX1 because of match_rule_supported_vector() bailout 256-bit case. > May be add assert(UseAVX > 1 || vlen_in_bytes <= 16, ). > > I only have one question left - about check >= 256 in match_rule_supported_vector() Added the following assert on else path: + assert(UseAVX > 1 || vlen_in_bytes <= 16, "required"); ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From sviswanathan at openjdk.java.net Fri Feb 19 03:20:59 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 03:20:59 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v4] In-Reply-To: References: Message-ID: > The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 > > The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. > > Before: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms > > After: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: add assert on else path ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2520/files - new: https://git.openjdk.java.net/jdk/pull/2520/files/fa13679a..ad3ab2b1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2520.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2520/head:pull/2520 PR: https://git.openjdk.java.net/jdk/pull/2520 From kvn at openjdk.java.net Fri Feb 19 05:52:41 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 19 Feb 2021 05:52:41 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v3] In-Reply-To: References: <4TNVM4VqqsezSVCJndS1RpM9EvIfICm5mv1AbZhthgg=.3943a37b-cce4-415e-989f-b9696ad6c008@github.com> Message-ID: On Fri, 19 Feb 2021 02:30:58 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/x86.ad line 1695: >> >>> 1693: if(vlen == 2) { >>> 1694: return false; // Implementation limitation due to how shuffle is loaded >>> 1695: } else if (size_in_bits == 256 && UseAVX < 2) { >> >> Should this be >= 256? > > The general >= 256 part is taken care of early on in match_rule_supported_vector as below: > if (!vector_size_supported(bt, vlen)) { > return false; > } > The only additional check that is being done here is for float and double 256 bit vectors that are supported on AVX=1 and will pass the vector_size_supported check. > This is because the VectorLoadShuffle cannot be performed for 256 bit vectors on AVX1 platform as it needs "integer" 256 bit instructions which are only available on AVX2. Okay. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From kvn at openjdk.java.net Fri Feb 19 05:52:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 19 Feb 2021 05:52:39 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v4] In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 03:20:59 GMT, Sandhya Viswanathan wrote: >> The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 >> >> The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > add assert on else path Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From kvn at openjdk.java.net Fri Feb 19 05:52:42 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 19 Feb 2021 05:52:42 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v4] In-Reply-To: <_0LW_Yx-ItDG7f-3kVjOTPa5BbcaN1FMRBbATLMwpmk=.da545e81-59a0-4d80-9f45-0714fe7bfc94@github.com> References: <_0LW_Yx-ItDG7f-3kVjOTPa5BbcaN1FMRBbATLMwpmk=.da545e81-59a0-4d80-9f45-0714fe7bfc94@github.com> Message-ID: On Fri, 19 Feb 2021 03:17:56 GMT, Sandhya Viswanathan wrote: >> My bad - I missed that size is in bytes in assert. The assert is correct, as you said. >> And `} else {` part works for AVX1 because of match_rule_supported_vector() bailout 256-bit case. >> May be add assert(UseAVX > 1 || vlen_in_bytes <= 16, ). >> >> I only have one question left - about check >= 256 in match_rule_supported_vector() > > Added the following assert on else path: > + assert(UseAVX > 1 || vlen_in_bytes <= 16, "required"); Good. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From rcastanedalo at openjdk.java.net Fri Feb 19 08:21:40 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 19 Feb 2021 08:21:40 GMT Subject: RFR: 8259984: IGV: Crash when drawing control flow before GCM [v2] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 12:38:46 GMT, Nils Eliasson wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Extract new logic out of buildUpGraph() > > Marked as reviewed by neliasso (Reviewer). Replace backward traversal in the IGV block formation algorithm by forward traversal guided by node category information. This change addresses the reported assertion failures, places block projection nodes together with their predecessors, and gives a more natural block numbering. ------------- PR: https://git.openjdk.java.net/jdk/pull/2607 From rcastanedalo at openjdk.java.net Fri Feb 19 08:21:40 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 19 Feb 2021 08:21:40 GMT Subject: RFR: 8259984: IGV: Crash when drawing control flow before GCM [v2] In-Reply-To: <-Qd5a0SApy6Cp07xQK_xyYmpsGEGxZ1vdAzaFQvmVDc=.df55812b-e6f4-4559-be03-9d4e06ed3f6e@github.com> References: <-Qd5a0SApy6Cp07xQK_xyYmpsGEGxZ1vdAzaFQvmVDc=.df55812b-e6f4-4559-be03-9d4e06ed3f6e@github.com> Message-ID: <7RE9-JAx9l_fFB3TuY7N-YQJIjNiH11P3DYyOcI_H7k=.19c907f2-913e-4471-8324-3fa6fdde0565@github.com> On Thu, 18 Feb 2021 12:21:27 GMT, Christian Hagedorn wrote: > Looks good! Thanks again, Christian. ------------- PR: https://git.openjdk.java.net/jdk/pull/2607 From rcastanedalo at openjdk.java.net Fri Feb 19 08:21:41 2021 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 19 Feb 2021 08:21:41 GMT Subject: Integrated: 8259984: IGV: Crash when drawing control flow before GCM In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 12:23:55 GMT, Roberto Casta?eda Lozano wrote: > The block formation algorithm applied by IGV to schedule graphs without associated CFG information traverses the graph backwards. This makes it difficult to deal with block projection nodes, leading in some cases to double addition of block nodes and block edges, and ultimately causing assertion failures. > > This fix replaces the backward traversal by a forward traversal that relies on node category information (introduced in [8261336](https://bugs.openjdk.java.net/browse/JDK-8261336)) to identify control successors. The forward traversal is arguably simpler and, besides avoiding the reported assertion failure, has two advantages: > > - it places block projection nodes in the same block as their predecessors, and > - it numbers basic blocks more naturally. > > The following screenshots illustrate the improvements (before the fix to the left, after the fix to the right): > > ![cfgs-before-after](https://user-images.githubusercontent.com/8792647/108204708-6dcf5e00-7124-11eb-956c-fb7f84229b50.png) > > Tested automatically on tens of thousands of graphs by running `java -Xcomp -XX:-TieredCompilation -XX:PrintIdealGraphLevel=4 ...` on an instrumented version of IGV that schedules graphs eagerly. Checked manually that the CFGs of a few selected graphs (included the reported one) are well-formed. Also checked that the overall IGV graph scheduling time is not affected by the changes. > > Thanks to Christian Hagedorn for checking the fix independently. This pull request has now been integrated. Changeset: 61820b74 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.java.net/jdk/commit/61820b74 Stats: 169 lines in 1 file changed: 92 ins; 15 del; 62 mod 8259984: IGV: Crash when drawing control flow before GCM Replace backward traversal in the IGV block formation algorithm by forward traversal guided by node category information. This change addresses the reported assertion failures, places block projection nodes together with their predecessors, and gives a more natural block numbering. Reviewed-by: chagedorn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/2607 From neliasso at openjdk.java.net Fri Feb 19 09:31:41 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 19 Feb 2021 09:31:41 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v4] In-Reply-To: References: Message-ID: <4j6fwKgy-WwLXNITZYeqzBVreGvoYli08IP4OxpnmUI=.7d07c386-7778-43e3-8245-5c13b411a63d@github.com> On Fri, 19 Feb 2021 03:20:59 GMT, Sandhya Viswanathan wrote: >> The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 >> >> The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > add assert on else path Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2520 From dnsimon at openjdk.java.net Fri Feb 19 11:26:03 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 19 Feb 2021 11:26:03 GMT Subject: RFR: 8262011: [JVMCI] allow printing to tty from unattached libgraal thread In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 10:10:17 GMT, Doug Simon wrote: > Currently, `HotSpotJVMCIRuntime.writeDebugOutput` does nothing if the current thread is not attached to HotSpot (i.e., `Thread::current_or_null() == NULL`). This means crucial debug info can be lost. For reference, an unattached libgraal thread is a thread started from within libgraal that has not yet attached itself to the VM (e.g., before [this line](https://github.com/oracle/graal/blob/e4b9ab931940e1946f96f2015b937ba100384573/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java#L42)) or has already detached itself (e.g., after [this line](https://github.com/oracle/graal/blob/e4b9ab931940e1946f96f2015b937ba100384573/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java#L46)). > > The reason for the current behavior is that `HotSpotJVMCIRuntime.writeDebugOutput` passes a Java byte array to C++ code and the C++ code calls back into Java to decode the byte array into a native buffer. These call backs require the current thread to be attached to the VM. > > This PR moves the Java-to-native-buffer decoding into Java and thus avoids the requirement for the current thread to be attached to the VM. > > Tested in libgraal by patching Graal as follows: > diff --git a/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java b/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java > index 36064767c95..352395dd59b 100644 > --- a/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java > +++ b/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java > @@ -43,7 +43,14 @@ public class GraalServiceThread extends Thread { > try { > runnable.run(); > } finally { > + String debug = System.getenv("GraalServiceThread.debug"); > afterRun(); > + if ("true".equals(debug)) { > + throw new InternalError("THROWN AFTER DETACHING"); > + } > } > } > > Running without the changes in this PR: >> env GraalServiceThread.debug=true java -jar dacapo.jar avrora > ===== DaCapo 9.12 avrora starting ===== > ===== DaCapo 9.12 avrora PASSED in 4270 msec ===== > > Running with the changes in this PR: >> env GraalServiceThread.debug=true java -jar dacapo.jar avrora > ===== DaCapo 9.12 avrora starting ===== > Exception in thread "LibGraalHotSpotGraalManagement-init" java.lang.InternalError: THROWN AFTER DETACHING > at org.graalvm.compiler.core.GraalServiceThread.run(GraalServiceThread.java:52) > at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:519) > at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:192) > ===== DaCapo 9.12 avrora PASSED in 4688 msec ===== src/hotspot/share/jvmci/jvmci.cpp line 220: > 218: Thread* thread = Thread::current_or_null_safe(); > 219: if (thread != NULL) { > 220: events->logv(thread, format, ap); This fixes a bug found while testing this PR. The `StringEventLog::logv` method requires the current thread to not be NULL. test/hotspot/jtreg/compiler/jvmci/compilerToVM/DebugOutputTest.java line 1: > 1: /* DebugOutputTest has been removed since it's redundant with TestHotSpotJVMCIRuntime.writeDebugOutputTest. ------------- PR: https://git.openjdk.java.net/jdk/pull/2640 From dnsimon at openjdk.java.net Fri Feb 19 11:26:02 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 19 Feb 2021 11:26:02 GMT Subject: RFR: 8262011: [JVMCI] allow printing to tty from unattached libgraal thread Message-ID: Currently, `HotSpotJVMCIRuntime.writeDebugOutput` does nothing if the current thread is not attached to HotSpot (i.e., `Thread::current_or_null() == NULL`). This means crucial debug info can be lost. For reference, an unattached libgraal thread is a thread started from within libgraal that has not yet attached itself to the VM (e.g., before [this line](https://github.com/oracle/graal/blob/e4b9ab931940e1946f96f2015b937ba100384573/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java#L42)) or has already detached itself (e.g., after [this line](https://github.com/oracle/graal/blob/e4b9ab931940e1946f96f2015b937ba100384573/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java#L46)). The reason for the current behavior is that `HotSpotJVMCIRuntime.writeDebugOutput` passes a Java byte array to C++ code and the C++ code calls back into Java to decode the byte array into a native buffer. These call backs require the current thread to be attached to the VM. This PR moves the Java-to-native-buffer decoding into Java and thus avoids the requirement for the current thread to be attached to the VM. Tested in libgraal by patching Graal as follows: diff --git a/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java b/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java index 36064767c95..352395dd59b 100644 --- a/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java +++ b/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java @@ -43,7 +43,14 @@ public class GraalServiceThread extends Thread { try { runnable.run(); } finally { + String debug = System.getenv("GraalServiceThread.debug"); afterRun(); + if ("true".equals(debug)) { + throw new InternalError("THROWN AFTER DETACHING"); + } } } Running without the changes in this PR: > env GraalServiceThread.debug=true java -jar dacapo.jar avrora ===== DaCapo 9.12 avrora starting ===== ===== DaCapo 9.12 avrora PASSED in 4270 msec ===== Running with the changes in this PR: > env GraalServiceThread.debug=true java -jar dacapo.jar avrora ===== DaCapo 9.12 avrora starting ===== Exception in thread "LibGraalHotSpotGraalManagement-init" java.lang.InternalError: THROWN AFTER DETACHING at org.graalvm.compiler.core.GraalServiceThread.run(GraalServiceThread.java:52) at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:519) at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:192) ===== DaCapo 9.12 avrora PASSED in 4688 msec ===== ------------- Commit messages: - 8262011: [JVMCI] allow printing to tty from unattached libgraal thread Changes: https://git.openjdk.java.net/jdk/pull/2640/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2640&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262011 Stats: 322 lines in 8 files changed: 49 ins; 246 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/2640.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2640/head:pull/2640 PR: https://git.openjdk.java.net/jdk/pull/2640 From vlivanov at openjdk.java.net Fri Feb 19 12:02:39 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 19 Feb 2021 12:02:39 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 22:36:37 GMT, Coleen Phillimore wrote: >> Simplify `ClassHierarchyWalker::find_witness_anywhere()` which iterates over class hierarchy under context class searching for witnesses. >> >> Current implementation traverses the hierarchy in a breadth-first manner and keeps a stack-allocated array to keep a worklist. >> But all the subclasses are already part of a singly linked list formed by `Klass::subklass()`/`next_sibling()`/`superklass()`. >> >> Proposed refactoring gets rid of the explicit worklist and switches to the traversal over the linked list (encapsulated into `ClassHierarchyIterator`). It performs depth-first pre-order hierarchy traversal. >> >> (There are some other minor refactorings applied in `ClassHierarchyWalker` along the way.) >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] additional verification that CHA decisions aren't affected > > src/hotspot/share/oops/instanceKlass.hpp line 1486: > >> 1484: } >> 1485: }; >> 1486: > > The _subklass and _next_sibling fields and implementation are in Klass (klass.hpp/cpp) but I always wonder why they are not in InstanceKlass (instanceKlass.hpp/cpp). If this new class is in instanceKlass.hpp, these fields should be in the same. If you agree, we should file an RFE and I will move them. If the fields truly belong in klass.hpp, then this should be there too. ie. they should match. I think it's `java.lang.Object` which complicates things. All array classes are rooted at Object, so neither `_subklass` nor `_next_sibling` can be changed to `InstanceKlass`. I put `ClassHierarchyIterator` in `instanceKlass.hpp` primarily because it accepts only `InstanceKlass` root class. But I'm fine with putting it into `klass.hpp`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2630 From vlivanov at openjdk.java.net Fri Feb 19 12:10:43 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 19 Feb 2021 12:10:43 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 22:40:03 GMT, Coleen Phillimore wrote: >> Simplify `ClassHierarchyWalker::find_witness_anywhere()` which iterates over class hierarchy under context class searching for witnesses. >> >> Current implementation traverses the hierarchy in a breadth-first manner and keeps a stack-allocated array to keep a worklist. >> But all the subclasses are already part of a singly linked list formed by `Klass::subklass()`/`next_sibling()`/`superklass()`. >> >> Proposed refactoring gets rid of the explicit worklist and switches to the traversal over the linked list (encapsulated into `ClassHierarchyIterator`). It performs depth-first pre-order hierarchy traversal. >> >> (There are some other minor refactorings applied in `ClassHierarchyWalker` along the way.) >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] additional verification that CHA decisions aren't affected > > src/hotspot/share/code/dependencies.cpp line 1345: > >> 1343: } else if (nof_impls == 1) { // unique implementor >> 1344: assert(context_type != context_type->implementor(), "not unique"); >> 1345: context_type = InstanceKlass::cast(context_type->implementor()); > > There's no reason that implementor() should return Klass* rather than InstanceKlass* ? We can clean that up also later to reduce casts. (I thought I already tried to do this once). Yes, I don't see a compelling reason for `implementor()` to return `Klass*` instead of `InstanceKlass*`. Would be nice to clean it up. Do you want me to file an RFE? ------------- PR: https://git.openjdk.java.net/jdk/pull/2630 From coleenp at openjdk.java.net Fri Feb 19 13:00:43 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 19 Feb 2021 13:00:43 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 12:00:20 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/oops/instanceKlass.hpp line 1486: >> >>> 1484: } >>> 1485: }; >>> 1486: >> >> The _subklass and _next_sibling fields and implementation are in Klass (klass.hpp/cpp) but I always wonder why they are not in InstanceKlass (instanceKlass.hpp/cpp). If this new class is in instanceKlass.hpp, these fields should be in the same. If you agree, we should file an RFE and I will move them. If the fields truly belong in klass.hpp, then this should be there too. ie. they should match. > > I think it's `java.lang.Object` which complicates things. All array classes are rooted at Object, so neither `_subklass` nor `_next_sibling` can be changed to `InstanceKlass`. I put `ClassHierarchyIterator` in `instanceKlass.hpp` primarily because it accepts only `InstanceKlass` root class. But I'm fine with putting it into `klass.hpp`. Does the compiler dependencies use the _subklass lists for array classes? The only other use for this is vtable reinitialization. It is fine to leave ClassHIerarchyIterator in instanceKlass.hpp since that's how it's used in this change. I might investigate this more some later time. ------------- PR: https://git.openjdk.java.net/jdk/pull/2630 From coleenp at openjdk.java.net Fri Feb 19 13:03:42 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 19 Feb 2021 13:03:42 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 12:07:47 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/code/dependencies.cpp line 1345: >> >>> 1343: } else if (nof_impls == 1) { // unique implementor >>> 1344: assert(context_type != context_type->implementor(), "not unique"); >>> 1345: context_type = InstanceKlass::cast(context_type->implementor()); >> >> There's no reason that implementor() should return Klass* rather than InstanceKlass* ? We can clean that up also later to reduce casts. (I thought I already tried to do this once). > > Yes, I don't see a compelling reason for `implementor()` to return `Klass*` instead of `InstanceKlass*`. Would be nice to clean it up. Do you want me to file an RFE? I just filed one. We've been trying to make metadata types more specific for years and must have missed this one. ------------- PR: https://git.openjdk.java.net/jdk/pull/2630 From vlivanov at openjdk.java.net Fri Feb 19 13:26:42 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 19 Feb 2021 13:26:42 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 12:57:58 GMT, Coleen Phillimore wrote: > Does the compiler dependencies use the _subklass lists for array classes? Arrays aren't interesting in the context of CHA (which `Dependencies` implements), so they are handled only because they are encountered during hierarchy traversal under `java.lang.Object`: bool is_witness(Klass* k) { if (doing_subtype_search()) { return Dependencies::is_concrete_klass(k); } else if (!k->is_instance_klass()) { return false; // no methods to find in an array type } else { ... ------------- PR: https://git.openjdk.java.net/jdk/pull/2630 From vlivanov at openjdk.java.net Fri Feb 19 13:39:40 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 19 Feb 2021 13:39:40 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 22:41:22 GMT, Coleen Phillimore wrote: >> Simplify `ClassHierarchyWalker::find_witness_anywhere()` which iterates over class hierarchy under context class searching for witnesses. >> >> Current implementation traverses the hierarchy in a breadth-first manner and keeps a stack-allocated array to keep a worklist. >> But all the subclasses are already part of a singly linked list formed by `Klass::subklass()`/`next_sibling()`/`superklass()`. >> >> Proposed refactoring gets rid of the explicit worklist and switches to the traversal over the linked list (encapsulated into `ClassHierarchyIterator`). It performs depth-first pre-order hierarchy traversal. >> >> (There are some other minor refactorings applied in `ClassHierarchyWalker` along the way.) >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] additional verification that CHA decisions aren't affected > > This seems like a nice change. Some questions. Thanks for the reviews, Vladimir and Coleen. ------------- PR: https://git.openjdk.java.net/jdk/pull/2630 From eosterlund at openjdk.java.net Fri Feb 19 14:03:44 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 19 Feb 2021 14:03:44 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 17:05:08 GMT, Vladimir Ivanov wrote: > Simplify `ClassHierarchyWalker::find_witness_anywhere()` which iterates over class hierarchy under context class searching for witnesses. > > Current implementation traverses the hierarchy in a breadth-first manner and keeps a stack-allocated array to keep a worklist. > But all the subclasses are already part of a singly linked list formed by `Klass::subklass()`/`next_sibling()`/`superklass()`. > > Proposed refactoring gets rid of the explicit worklist and switches to the traversal over the linked list (encapsulated into `ClassHierarchyIterator`). It performs depth-first pre-order hierarchy traversal. > > (There are some other minor refactorings applied in `ClassHierarchyWalker` along the way.) > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] additional verification that CHA decisions aren't affected It looks like there is a traversal bug lurking here, unless I missed something. Looks great otherwise. src/hotspot/share/oops/instanceKlass.hpp line 1457: > 1455: // Make a step iterating over the class hierarchy under the root class. > 1456: // Skips subclasses if requested. > 1457: void next() { Since this method is a bit involved, I think it might want to move to the cpp file. src/hotspot/share/oops/instanceKlass.hpp line 1461: > 1459: if (_visit_subclasses && _current->subklass() != NULL) { > 1460: _current = _current->subklass(); > 1461: return; // visit next subclass _visit_subclasses is initially true in the constructor. That seems to imply that after the first call to next(), we will take this path, just returning the subklass() without walking the siblings. The _visit_subclasses variable is never mutated away from the true state at all if it was already true, until we get to the bottom of the class hierarchy. That seems to imply that the next() iterator won't visit any siblings at all until we get to the bottom of the class hierarchy, essentially breaking the DFS traversal. Therefore, it looks to me like the use of the _visit_subclasses variable needs to be looked over a bit in this function in general, unless I misunderstood something. ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2630 From aph at openjdk.java.net Fri Feb 19 14:46:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 19 Feb 2021 14:46:41 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v7] In-Reply-To: References: Message-ID: <1zpEXArPUG-5mAnHqc7E-YBJ_whPIr0KWpSrgqV2mnQ=.d72ed309-938a-4d6a-9bbc-0a8f065c8411@github.com> On Fri, 19 Feb 2021 03:13:12 GMT, Dong Bo wrote: >> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, >> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: >> /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ >> public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); >> >> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, >> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. >> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. >> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); >> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); >> >> The legal right shift amount should be in the range 1 to the element width in bits on aarch64: >> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en >> >> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. >> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. > > Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > handle zero shift in macro assembler src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 554: > 552: > 553: WRAP(usra) WRAP(ssra) > 554: #undef WRAP Are ssra and usra tested by anything? I don't seem them accessed in the test case. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 531: > 529: > 530: // NEON shift instructions > 531: #define WRAP(INSN) \ This comment should be // AdvSIMD shift by immediate. // These are "user friendly" variants which allow a shift count of 0. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From vlivanov at openjdk.java.net Fri Feb 19 15:23:42 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 19 Feb 2021 15:23:42 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: <7QNflPk493_-jk-ajLWMWdFVmqFwS7uLOFramnT7a_U=.de35bc6b-c490-4351-bf76-4a37bcc4397d@github.com> On Fri, 19 Feb 2021 13:52:35 GMT, Erik ?sterlund wrote: >> Simplify `ClassHierarchyWalker::find_witness_anywhere()` which iterates over class hierarchy under context class searching for witnesses. >> >> Current implementation traverses the hierarchy in a breadth-first manner and keeps a stack-allocated array to keep a worklist. >> But all the subclasses are already part of a singly linked list formed by `Klass::subklass()`/`next_sibling()`/`superklass()`. >> >> Proposed refactoring gets rid of the explicit worklist and switches to the traversal over the linked list (encapsulated into `ClassHierarchyIterator`). It performs depth-first pre-order hierarchy traversal. >> >> (There are some other minor refactorings applied in `ClassHierarchyWalker` along the way.) >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] additional verification that CHA decisions aren't affected > > src/hotspot/share/oops/instanceKlass.hpp line 1461: > >> 1459: if (_visit_subclasses && _current->subklass() != NULL) { >> 1460: _current = _current->subklass(); >> 1461: return; // visit next subclass > > _visit_subclasses is initially true in the constructor. That seems to imply that after the first call to next(), we will take this path, just returning the subklass() without walking the siblings. The _visit_subclasses variable is never mutated away from the true state at all if it was already true, until we get to the bottom of the class hierarchy. That seems to imply that the next() iterator won't visit any siblings at all until we get to the bottom of the class hierarchy, essentially breaking the DFS traversal. Therefore, it looks to me like the use of the _visit_subclasses variable needs to be looked over a bit in this function in general, unless I misunderstood something. `_visit_subclasses` is mutated only from the outside (there's `ClassHierarchyIterator::skip_subclasses()` specificlaly for that) when user code needs to ignore subclasses: https://github.com/openjdk/jdk/blob/ae78e51e4248c2ccfa73c772fb1db1baad2c2903/src/hotspot/share/code/dependencies.cpp#L1359: for (ClassHierarchyIterator iter(context_type); !iter.done(); iter.next()) { Klass* sub = iter.klass(); // Do not report participant types. if (is_participant(sub)) { // Walk beneath a participant only when it doesn't hide witnesses. if (participants_hide_witnesses) { iter.skip_subclasses(); } And `_visit_subclasses` is cleared to the initial state on the very next call to `next()`. > That seems to imply that the next() iterator won't visit any siblings at all until we get to the bottom of the class hierarchy, essentially breaking the DFS traversal. Yes, that's the intended behavior. Why do think it doesn't obey depth-first order? `next_sibling()` points to a class on the same level of class hierarchy. > src/hotspot/share/oops/instanceKlass.hpp line 1457: > >> 1455: // Make a step iterating over the class hierarchy under the root class. >> 1456: // Skips subclasses if requested. >> 1457: void next() { > > Since this method is a bit involved, I think it might want to move to the cpp file. Good point. ------------- PR: https://git.openjdk.java.net/jdk/pull/2630 From iveresov at openjdk.java.net Fri Feb 19 15:45:58 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 19 Feb 2021 15:45:58 GMT Subject: RFR: 8261225: TieredStopAtLevel should have no effect if TieredCompilation is disabled Message-ID: Ignore TieredStopAtLevel flag is TieredCompilation is off for compatibility with the old compilation policy. Also did some polishing of things that came up in the process. ------------- Commit messages: - Ignore TieredStopAtLevel flag is TieredCompilation is off, polish. Changes: https://git.openjdk.java.net/jdk/pull/2647/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2647&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261225 Stats: 137 lines in 3 files changed: 101 ins; 20 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/2647.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2647/head:pull/2647 PR: https://git.openjdk.java.net/jdk/pull/2647 From eosterlund at openjdk.java.net Fri Feb 19 15:55:42 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 19 Feb 2021 15:55:42 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: <7QNflPk493_-jk-ajLWMWdFVmqFwS7uLOFramnT7a_U=.de35bc6b-c490-4351-bf76-4a37bcc4397d@github.com> References: <7QNflPk493_-jk-ajLWMWdFVmqFwS7uLOFramnT7a_U=.de35bc6b-c490-4351-bf76-4a37bcc4397d@github.com> Message-ID: On Fri, 19 Feb 2021 15:20:50 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/oops/instanceKlass.hpp line 1461: >> >>> 1459: if (_visit_subclasses && _current->subklass() != NULL) { >>> 1460: _current = _current->subklass(); >>> 1461: return; // visit next subclass >> >> _visit_subclasses is initially true in the constructor. That seems to imply that after the first call to next(), we will take this path, just returning the subklass() without walking the siblings. The _visit_subclasses variable is never mutated away from the true state at all if it was already true, until we get to the bottom of the class hierarchy. That seems to imply that the next() iterator won't visit any siblings at all until we get to the bottom of the class hierarchy, essentially breaking the DFS traversal. Therefore, it looks to me like the use of the _visit_subclasses variable needs to be looked over a bit in this function in general, unless I misunderstood something. > > `_visit_subclasses` is mutated only from the outside (there's `ClassHierarchyIterator::skip_subclasses()` specificlaly for that) when user code needs to ignore subclasses: > > https://github.com/openjdk/jdk/blob/ae78e51e4248c2ccfa73c772fb1db1baad2c2903/src/hotspot/share/code/dependencies.cpp#L1359: > > for (ClassHierarchyIterator iter(context_type); !iter.done(); iter.next()) { > Klass* sub = iter.klass(); > > // Do not report participant types. > if (is_participant(sub)) { > // Walk beneath a participant only when it doesn't hide witnesses. > if (participants_hide_witnesses) { > iter.skip_subclasses(); > } > > And `_visit_subclasses` is cleared to the initial state on the very next call to `next()`. > >> That seems to imply that the next() iterator won't visit any siblings at all until we get to the bottom of the class hierarchy, essentially breaking the DFS traversal. > > Yes, that's the intended behavior. Why do think it doesn't obey depth-first order? `next_sibling()` points to a class on the same level of class hierarchy. Okay, I see. So you traverse all the way down, following the subclass links until you can't go further, and then traverse the siblings on the way up. That makes sense. Thanks for the explanation. ------------- PR: https://git.openjdk.java.net/jdk/pull/2630 From eosterlund at openjdk.java.net Fri Feb 19 15:55:40 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 19 Feb 2021 15:55:40 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 17:05:08 GMT, Vladimir Ivanov wrote: > Simplify `ClassHierarchyWalker::find_witness_anywhere()` which iterates over class hierarchy under context class searching for witnesses. > > Current implementation traverses the hierarchy in a breadth-first manner and keeps a stack-allocated array to keep a worklist. > But all the subclasses are already part of a singly linked list formed by `Klass::subklass()`/`next_sibling()`/`superklass()`. > > Proposed refactoring gets rid of the explicit worklist and switches to the traversal over the linked list (encapsulated into `ClassHierarchyIterator`). It performs depth-first pre-order hierarchy traversal. > > (There are some other minor refactorings applied in `ClassHierarchyWalker` along the way.) > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] additional verification that CHA decisions aren't affected Looks good! (move next() to cpp file if you want to) ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2630 From nhe at activeviam.com Fri Feb 19 16:54:37 2021 From: nhe at activeviam.com (Nicolas Heutte) Date: Fri, 19 Feb 2021 17:54:37 +0100 Subject: [External] : Re: SuperWord loop optimization lost after method inlining In-Reply-To: <09a53249-00de-91c1-d88b-6c816ce6dd14@oracle.com> References: <87dcb670-622f-6b05-7c06-289f9b4f3634@oracle.com> <08ce2121-bbb4-3ea5-8f05-efeb33df7b74@oracle.com> <09a53249-00de-91c1-d88b-6c816ce6dd14@oracle.com> Message-ID: Hello Vladimir, I've added the requested log to the shared folder ( https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing). I've also tried disabling the strip mining optimization as you suggested, but there was no significant performance change. Best regards, Nicolas Heutte On Wed, Feb 17, 2021 at 9:05 PM Vladimir Kozlov wrote: > Unfortunately it is still not the file I am looking for. > > First, remove -XX:+PrintAssembly flag from command line. I have already > files with assembler code. > > Second, I see link to the file I am looking for: > filename='C:\Users\NicolasHeutte\AppData\Local\Temp\\hs_c16812_pid15016.log'/> > > If you still have it, please send it. If application stopped before normal > exit that file is not merged into > hotspot_pid.log file. > > If you don't have it - do an other run with -XX:CICompilerCount=1 to use > only one C2 compiler thread with Tiered off. It > will simplify ordering of log. > > You can also do an other experiment without collecting log. Run app with > next flags to disable loop strip minning > optimization: -XX:-UseCountedLoopSafepoints -XX:LoopStripMiningIter=0 > > Thanks, > Vladimir K > > On 2/17/21 2:34 AM, Nicolas Heutte wrote: > > Hi Vladimir, > > > > I have rerun the test with the appropriate options, the obtained logs > are in this folder: > > > https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing > > < > https://urldefense.com/v3/__https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing__;!!GqivPVa7Brio!PBl1ZdyC5xtmVS0QG3dxZxEen0D1LP-mBM0KnvmRVbQXpL_VPOQ9OD-pVGBqNvvSUuoKhQ$ > > > > > > Best regards, > > Nicolas Heutte > > > > On Tue, Feb 16, 2021 at 11:35 PM Vladimir Kozlov < > vladimir.kozlov at oracle.com > wrote: > > > > Hi Nicolas, > > > > The file you shared has only assembler code. Yes, it shows that when > ArrayFloatToArrayFloatVectorBinding::plus() is > > inlined into AVector::plus() it is not vectorized. > > > > But I asked for an other file (hotspot_pid.log) which is > generated when you run app with > > -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation flags. It should > start with: > > > > > > time_ms='1613514688748'> > > > > > > Java HotSpot(TM) 64-Bit Server VM > > > > > > 11.0.9+7-LTS > > > > > > Thanks, > > Vladimir K > > > > On 2/15/21 5:19 AM, Nicolas Heutte wrote: > > > Hi Vladimir, > > > > > > I've tried disabling tiered compilation, as you requested. It > seems that the inlining was performed slightly > > > differently, but the issue remains. As you can see in this > excerpt, the main loop isn't properly vectorized: > > > > > > 0x00000254b0d4bf54: cmp %r11d,%r8d > > > 0x00000254b0d4bf57: jae 0x00000254b0d4c19e > > > 0x00000254b0d4bf5d: vmovss 0x10(%rcx,%r8,4),%xmm9 ;*faload > {reexecute=0 rethrow=0 return_oop=0} > > > ; - > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 > (line 41) > > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > ; - > com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > > > ; - > > > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 > (line 66) > > > ; - > com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > > (line 118) > > > > > > 0x00000254b0d4bf64: cmp %ebx,%r8d > > > 0x00000254b0d4bf67: jae 0x00000254b0d4c1ec > > > 0x00000254b0d4bf6d: vaddss 0x10(%rdi,%r8,4),%xmm9,%xmm9 > > > 0x00000254b0d4bf74: vmovss %xmm9,0x10(%rcx,%r8,4) ;*fastore > {reexecute=0 rethrow=0 return_oop=0} > > > ; - > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > (line 41) > > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > ; - > com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > > > ; - > > > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 > (line 66) > > > ; - > com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > > (line 118) > > > > > > 0x00000254b0d4bf7b: inc %r8d ;*iinc > {reexecute=0 rethrow=0 return_oop=0} > > > ; - > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > (line 40) > > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > ; - > com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > > > ; - > > > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 > (line 66) > > > ; - > com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > > (line 118) > > > > > > 0x00000254b0d4bf7e: cmp %r9d,%r8d > > > 0x00000254b0d4bf81: jl 0x00000254b0d4bf54 ;*goto > {reexecute=0 rethrow=0 return_oop=0} > > > ; - > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > (line 40) > > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > ; - > com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 (line 103) > > > ; - > > > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 > (line 66) > > > ; - > com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > > (line 118) > > > > > > > > > > > > Here is the link to the full log, should you want to take a look > at it: > > > > https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing > > < > https://urldefense.com/v3/__https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing__;!!GqivPVa7Brio!PBl1ZdyC5xtmVS0QG3dxZxEen0D1LP-mBM0KnvmRVbQXpL_VPOQ9OD-pVGBqNvuMpg6inQ$ > > > > > > > > > < > https://urldefense.com/v3/__https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing__;!!GqivPVa7Brio!PBuP6MfDNWUOTe23SSXA0V5wn_VHjv2sjI8POWRwp6mr0wVdIzFhNoVZANb4FqCYKwzapw$ > > < > https://urldefense.com/v3/__https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing__;!!GqivPVa7Brio!PBuP6MfDNWUOTe23SSXA0V5wn_VHjv2sjI8POWRwp6mr0wVdIzFhNoVZANb4FqCYKwzapw$ > >> > > > > > > Best regards, > > > Nicolas Heutte > > > > > > On Thu, Feb 11, 2021 at 7:05 PM Vladimir Kozlov < > vladimir.kozlov at oracle.com > > vladimir.kozlov at oracle.com>>> wrote: > > > > > > Changing wide mailing list to JIT compiler only. > > > > > > This deoptimization is normal in Tiered Compilation - it > switched from profiling code (level='3') generated by C1 > > > compiler to new code generated by C2 (level='4') which does > loop optimizations. > > > > > > Thank you for posting inlining information: > > > > > > @ 17 > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 > bytes) inline (hot) > > > \-> TypeProfile (14054/14054 counts) = > com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > > > > > > I thought before that may be call site is not hot but it is > not the case. > > > > > > You can do an other experiment to collect log with disabled > Tiered Compilation (only C2 is used): > > -XX:-TieredCompilation > > > Also print assembler code (as you did before) for final > compilation to see if loop is still not vectorized. > > > > > > Is it possible to post log file (on GitHub?) for me to look? > > > > > > Thanks, > > > Vladimir K > > > > > > On 2/11/21 6:28 AM, Nicolas Heutte wrote: > > > > Hi Vladimir, > > > > > > > > Thank you for your help. > > > > > > > > I'm currently running Java 11.0.9, and I did not use any > VM flag of note. > > > > > > > > I checked the content of the compilation log, and it seems > that > > ArrayFloatToArrayFloatVectorBinding::plus() was > > > > deoptimized in order to allow AVector::plus() to be > compiled: > > > > > > > > > > > > method='com.qfs.vector.impl.AVector plus (Lcom/qfs/vector/IVector;)V' > > bytes='23' > > > > count='916' iicount='916' level='3' stamp='7394.056' > comment='tiered' hot_count='896'/> > > > > > > > > pc='0x00000296d0785b94' compile_id='17257' compiler='c1' > > level='3'> > > > > method='com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding plus > > > > (Lcom/qfs/vector/IVector;Lcom/qfs/vector/IVector;)V' > bytes='69' count='909' backedge_count='155602' > > iicount='910'/> > > > > > > > > > > > > The last compilation entry for AVector::plus() is: > > > > > > > > > > > > entry='0x00000296d6af32c0' size='1960' > > > address='0x00000296d6af3110' > > > > relocation_offset='376' insts_offset='432' > stub_offset='1040' scopes_data_offset='1152' > > scopes_pcs_offset='1592' > > > > dependencies_offset='1880' nul_chk_table_offset='1896' > oops_offset='1064' metadata_offset='1072' > > > > method='com.qfs.vector.impl.AVector plus > (Lcom/qfs/vector/IVector;)V' bytes='23' count='172425' > > iicount='172425' > > > > stamp='7394.199'/> > > > > compiler='c1' level='2' stamp='7394.199'/> > > > > @ 1 > com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) > > inline > > > (hot) > > > > \-> TypeProfile > (14552/14552 counts) = > > com/qfs/vector/array/impl/ArrayFloatVector > > > > @ 7 > com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) > > inline > > > (hot) > > > > \-> TypeProfile > (14150/14150 counts) = > > com/qfs/vector/array/impl/ArrayFloatVector > > > > @ 10 > com.qfs.vector.binding.impl.VectorBindings::getBinding (9 bytes) > > inline (hot) > > > > @ 5 > > > > com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::getBinding > (22 > > > > bytes) inline (hot) > > > > @ 3 > > > > com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::hasBinding > > > > (34 bytes) inline (hot) > > > > @ 17 > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 > > > bytes) > > > > inline (hot) > > > > \-> TypeProfile > (14054/14054 counts) = > > > > > com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > > > > @ 12 > com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) > > inline (hot) > > > > @ 22 > com.qfs.vector.impl.AVector::checkIndex (37 bytes) inline (hot) > > > > @ 6 > com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) > > inline (hot) > > > > @ 27 > com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) > > > accessor > > > > @ 34 > com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) > > > accessor > > > > > > > > > > > > Unfortunately, I do not have access to a debug VM build, > so I cannot run the second test you recommend. > > > > > > > > Best regards, > > > > Nicolas Heutte > > > > > > > > On Wed, Feb 10, 2021 at 7:36 PM Vladimir Kozlov < > vladimir.kozlov at oracle.com > > vladimir.kozlov at oracle.com > > > > vladimir.kozlov at oracle.com> > >>> wrote: > > > > > > > > Hi, Nicolas > > > > > > > > Looks like, when inlined, the loop from > ArrayFloatToArrayFloatVectorBinding::plus() was not optimized > > at all: > > > it is not > > > > unrolled and has range checks. Such loops are not > vectorized (you need unrolling and no checks). > > > > > > > > What Java version you are running? What HotSpot VM > flags you are using when running application? > > > > > > > > Run application with -XX:+LogCompilation and look on > compilation data in hotspot_pid.log file for > > caller > > > > AVector::plus(). > > > > > > > > VM also has several flags to trace loop optimizations > but they are only available in debug VM build. > > If you > > > have access > > > > to such build run with -XX:+PrintCompilation > -XX:+TraceLoopOpts flags. > > > > > > > > Thanks, > > > > Vladimir K > > > > > > > > On 2/10/21 9:24 AM, Nicolas Heutte wrote: > > > > > Hi all, > > > > > > > > > > I am encountering a performance issue caused by the > interaction between > > > > > method inlining and automatic vectorization. > > > > > > > > > > Our application aggregates arrays intensively using > a method named > > > > > ArrayFloatToArrayFloatVectorBinding.plus() with the > following code: > > > > > > > > > > for (int i = 0; i < srcLen; ++i) { > > > > > > > > > > dstArray[i] += srcArray[i]; > > > > > > > > > > } > > > > > > > > > > When we microbenchmark this method we observe fast > performance close to the > > > > > practical memory bandwidth and when we print the > assembly code we observe > > > > > loop unrolling and automatic vectorization with > SIMD instructions. > > > > > > > > > > 0x000001ef4600abf0: vmovdqu > 0x10(%r14,%r13,4),%ymm0 > > > > > > > > > > 0x000001ef4600abf7: vaddps > 0x10(%rcx,%r13,4),%ymm0,%ymm0 > > > > > > > > > > 0x000001ef4600abfe: vmovdqu > %ymm0,0x10(%r14,%r13,4) > > > > > > > > > > 0x000001ef4600ac05: movslq %r13d,%r11 > > > > > > > > > > 0x000001ef4600ac08: vmovdqu > 0x30(%r14,%r11,4),%ymm0 > > > > > > > > > > 0x000001ef4600ac0f: vaddps > 0x30(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > > > 0x000001ef4600ac16: vmovdqu > %ymm0,0x30(%r14,%r11,4) > > > > > > > > > > 0x000001ef4600ac1d: vmovdqu > 0x50(%r14,%r11,4),%ymm0 > > > > > > > > > > 0x000001ef4600ac24: vaddps > 0x50(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > > > 0x000001ef4600ac2b: vmovdqu > %ymm0,0x50(%r14,%r11,4) > > > > > > > > > > 0x000001ef4600ac32: vmovdqu > 0x70(%r14,%r11,4),%ymm0 > > > > > > > > > > 0x000001ef4600ac39: vaddps > 0x70(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > > > 0x000001ef4600ac40: vmovdqu > %ymm0,0x70(%r14,%r11,4) > > > > > > > > > > 0x000001ef4600ac47: vmovdqu > 0x90(%r14,%r11,4),%ymm0 > > > > > > > > > > 0x000001ef4600ac51: vaddps > 0x90(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > > > 0x000001ef4600ac5b: vmovdqu > %ymm0,0x90(%r14,%r11,4) > > > > > > > > > > 0x000001ef4600ac65: vmovdqu > 0xb0(%r14,%r11,4),%ymm0 > > > > > > > > > > 0x000001ef4600ac6f: vaddps > 0xb0(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > > > 0x000001ef4600ac79: vmovdqu > %ymm0,0xb0(%r14,%r11,4) > > > > > > > > > > 0x000001ef4600ac83: vmovdqu > 0xd0(%r14,%r11,4),%ymm0 > > > > > > > > > > 0x000001ef4600ac8d: vaddps > 0xd0(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > > > 0x000001ef4600ac97: vmovdqu > %ymm0,0xd0(%r14,%r11,4) > > > > > > > > > > 0x000001ef4600aca1: vmovdqu > 0xf0(%r14,%r11,4),%ymm0 > > > > > > > > > > 0x000001ef4600acab: vaddps > 0xf0(%rcx,%r11,4),%ymm0,%ymm0 > > > > > > > > > > 0x000001ef4600acb5: vmovdqu > %ymm0,0xf0(%r14,%r11,4) ;*fastore > > > > > {reexecute=0 rethrow=0 return_oop=0} > > > > > > > > > > ; - > > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > > > > > (line 41) > > > > > > > > > > 0x000001ef4600acbf: add $0x40,%r13d > ;*iinc {reexecute=0 > > > > > rethrow=0 return_oop=0} > > > > > > > > > > ; - > > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > > > > > (line 40) > > > > > > > > > > 0x000001ef4600acc3: cmp %eax,%r13d > > > > > > > > > > 0x000001ef4600acc6: jl 0x000001ef4600abf0 > ;*goto {reexecute=0 > > > > > rethrow=0 return_oop=0} > > > > > > > > > > ; - > > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > > > > > (line 40) > > > > > > > > > > > > > > > > > > > > In the real application, this method is actually > inlined in a higher level > > > > > method named AVector.plus(). Unfortunately, the > inlined version of the > > > > > aggregation code is not vectorized anymore: > > > > > > > > > > > > > > > > > > > > 0x000001ef460180a0: cmp %ebx,%r11d > > > > > > > > > > 0x000001ef460180a3: jae 0x000001ef460180e6 > > > > > > > > > > 0x000001ef460180a5: vmovss > 0x10(%r8,%r11,4),%xmm1 ;*faload {reexecute=0 > > > > > rethrow=0 return_oop=0} > > > > > > > > > > ; - > > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 > > > > > (line 41) > > > > > > > > > > ; - > > > > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > > > > > 0x000001ef460180ac: cmp %ecx,%r11d > > > > > > > > > > 0x000001ef460180af: jae 0x000001ef46018104 > > > > > > > > > > 0x000001ef460180b1: vaddss > 0x10(%r9,%r11,4),%xmm1,%xmm1 > > > > > > > > > > 0x000001ef460180b8: vmovss > %xmm1,0x10(%r8,%r11,4) ;*fastore {reexecute=0 > > > > > rethrow=0 return_oop=0} > > > > > > > > > > ; - > > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > > > > > (line 41) > > > > > > > > > > ; - > > > > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > > > > > 0x000001ef460180bf: inc %r11d > ;*iinc {reexecute=0 > > > > > rethrow=0 return_oop=0} > > > > > > > > > > ; - > > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > > > > > (line 40) > > > > > > > > > > ; - > > > > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > > > > > 0x000001ef460180c2: cmp %r10d,%r11d > > > > > > > > > > 0x000001ef460180c5: jl 0x000001ef460180a0 > ;*goto {reexecute=0 > > > > > rethrow=0 return_oop=0} > > > > > > > > > > ; - > > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > > > > > (line 40) > > > > > > > > > > ; - > > > > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > > > > > > > > > > > > > > > This causes a significant performance drop, > compared to a run where we > > > > > explicitly disable the inlining and observe > automatically vectorized code > > > > > again ( > > > > > > -XX:CompileCommand=dontinline,com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding.plus > > > > > ). > > > > > > > > > > > > > > > How do you guys explain that behavior of the JIT > compiler? Is this a known > > > > > and tracked issue, could it be fixed in the JVM? > Can we do something in the > > > > > java code to prevent this from happening? > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > Nicolas Heutte > > > > > > > > > > > > > > > From sviswanathan at openjdk.java.net Fri Feb 19 18:13:41 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 18:13:41 GMT Subject: Integrated: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 02:37:35 GMT, Sandhya Viswanathan wrote: > The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 > > The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. > > Before: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms > > After: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms This pull request has now been integrated. Changeset: c53acc2a Author: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/c53acc2a Stats: 120 lines in 7 files changed: 100 ins; 5 del; 15 mod 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors Reviewed-by: kvn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From kvn at openjdk.java.net Fri Feb 19 18:55:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 19 Feb 2021 18:55:39 GMT Subject: RFR: 8261225: TieredStopAtLevel should have no effect if TieredCompilation is disabled In-Reply-To: References: Message-ID: <9BZwISNLE8FT3qrIO7F65ADtYBaNnN8dKp7PtFbE9w8=.7225d428-3156-4ae1-aa20-5b9f899a9d11@github.com> On Fri, 19 Feb 2021 15:41:23 GMT, Igor Veresov wrote: > Ignore TieredStopAtLevel flag is TieredCompilation is off for compatibility with the old compilation policy. Also did some polishing of things that came up in the process. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2647 From kvn at openjdk.java.net Fri Feb 19 19:00:40 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 19 Feb 2021 19:00:40 GMT Subject: RFR: 8262011: [JVMCI] allow printing to tty from unattached libgraal thread In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 10:10:17 GMT, Doug Simon wrote: > Currently, `HotSpotJVMCIRuntime.writeDebugOutput` does nothing if the current thread is not attached to HotSpot (i.e., `Thread::current_or_null() == NULL`). This means crucial debug info can be lost. For reference, an unattached libgraal thread is a thread started from within libgraal that has not yet attached itself to the VM (e.g., before [this line](https://github.com/oracle/graal/blob/e4b9ab931940e1946f96f2015b937ba100384573/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java#L42)) or has already detached itself (e.g., after [this line](https://github.com/oracle/graal/blob/e4b9ab931940e1946f96f2015b937ba100384573/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java#L46)). > > The reason for the current behavior is that `HotSpotJVMCIRuntime.writeDebugOutput` passes a Java byte array to C++ code and the C++ code calls back into Java to decode the byte array into a native buffer. These call backs require the current thread to be attached to the VM. > > This PR moves the Java-to-native-buffer decoding into Java and thus avoids the requirement for the current thread to be attached to the VM. > > Tested in libgraal by patching Graal as follows: > diff --git a/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java b/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java > index 36064767c95..352395dd59b 100644 > --- a/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java > +++ b/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java > @@ -43,7 +43,14 @@ public class GraalServiceThread extends Thread { > try { > runnable.run(); > } finally { > + String debug = System.getenv("GraalServiceThread.debug"); > afterRun(); > + if ("true".equals(debug)) { > + throw new InternalError("THROWN AFTER DETACHING"); > + } > } > } > > Running without the changes in this PR: >> env GraalServiceThread.debug=true java -jar dacapo.jar avrora > ===== DaCapo 9.12 avrora starting ===== > ===== DaCapo 9.12 avrora PASSED in 4270 msec ===== > > Running with the changes in this PR: >> env GraalServiceThread.debug=true java -jar dacapo.jar avrora > ===== DaCapo 9.12 avrora starting ===== > Exception in thread "LibGraalHotSpotGraalManagement-init" java.lang.InternalError: THROWN AFTER DETACHING > at org.graalvm.compiler.core.GraalServiceThread.run(GraalServiceThread.java:52) > at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:519) > at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:192) > ===== DaCapo 9.12 avrora PASSED in 4688 msec ===== Seems fine. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2640 From iveresov at openjdk.java.net Fri Feb 19 19:47:41 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 19 Feb 2021 19:47:41 GMT Subject: Integrated: 8261225: TieredStopAtLevel should have no effect if TieredCompilation is disabled In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 15:41:23 GMT, Igor Veresov wrote: > Ignore TieredStopAtLevel flag is TieredCompilation is off for compatibility with the old compilation policy. Also did some polishing of things that came up in the process. This pull request has now been integrated. Changeset: 977a21ad Author: Igor Veresov URL: https://git.openjdk.java.net/jdk/commit/977a21ad Stats: 137 lines in 3 files changed: 101 ins; 20 del; 16 mod 8261225: TieredStopAtLevel should have no effect if TieredCompilation is disabled Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/2647 From vladimir.kozlov at oracle.com Fri Feb 19 20:06:21 2021 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 19 Feb 2021 12:06:21 -0800 Subject: [External] : Re: SuperWord loop optimization lost after method inlining In-Reply-To: References: <87dcb670-622f-6b05-7c06-289f9b4f3634@oracle.com> <08ce2121-bbb4-3ea5-8f05-efeb33df7b74@oracle.com> <09a53249-00de-91c1-d88b-6c816ce6dd14@oracle.com> Message-ID: I need an other file C:\Users\NicolasHeutte\AppData\Local\Temp\\hs_c10212_pid15016.log created from second C2 compiler thread. It should have data for standalone ArrayFloatToArrayFloatVectorBinding::plus() method compilation. To see what is going on I have to compare these data. Thanks, Vladimir K On 2/19/21 8:54 AM, Nicolas Heutte wrote: > Hello Vladimir, > > I've added the requested log to the shared folder > (https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing > ). > I've also tried disabling the strip mining optimization as you suggested, but there was no significant performance change. > > Best regards, > Nicolas Heutte > > On Wed, Feb 17, 2021 at 9:05 PM Vladimir Kozlov > wrote: > > Unfortunately it is still not the file I am looking for. > > First, remove -XX:+PrintAssembly flag from command line. I have already files with assembler code. > > Second, I see link to the file I am looking for: > > > If you still have it, please send it. If application stopped before normal exit that file is not merged into > hotspot_pid.log file. > > If you don't have it - do an other run with -XX:CICompilerCount=1 to use only one C2 compiler thread with Tiered > off. It > will simplify ordering of log. > > You can also do an other experiment without collecting log. Run app with next flags to disable loop strip minning > optimization:? -XX:-UseCountedLoopSafepoints -XX:LoopStripMiningIter=0 > > Thanks, > Vladimir K > > On 2/17/21 2:34 AM, Nicolas Heutte wrote: > > Hi Vladimir, > > > > I have rerun the test with the appropriate options, the obtained logs are in this folder: > > https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing > > > > > > > > > > Best regards, > > Nicolas Heutte > > > > On Tue, Feb 16, 2021 at 11:35 PM Vladimir Kozlov > >> wrote: > > > >? ? ?Hi Nicolas, > > > >? ? ?The file you shared has only assembler code. Yes, it shows that when > ArrayFloatToArrayFloatVectorBinding::plus() is > >? ? ?inlined into AVector::plus() it is not vectorized. > > > >? ? ?But I asked for an other file (hotspot_pid.log) which is generated when you run app with > >? ? ?-XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation flags. It should start with: > > > >? ? ? > >? ? ? > >? ? ? > >? ? ? > >? ? ?Java HotSpot(TM) 64-Bit Server VM > >? ? ? > >? ? ? > >? ? ?11.0.9+7-LTS > >? ? ? > > > >? ? ?Thanks, > >? ? ?Vladimir K > > > >? ? ?On 2/15/21 5:19 AM, Nicolas Heutte wrote: > >? ? ? > Hi Vladimir, > >? ? ? > > >? ? ? > I've tried disabling tiered compilation, as you requested. It seems that the inlining was performed slightly > >? ? ? > differently, but the issue remains. As you can see in this excerpt, the main loop isn't properly vectorized: > >? ? ? > > >? ? ? >? ? 0x00000254b0d4bf54: cmp ? ?%r11d,%r8d > >? ? ? >? ? 0x00000254b0d4bf57: jae ? ?0x00000254b0d4c19e > >? ? ? >? ? 0x00000254b0d4bf5d: vmovss 0x10(%rcx,%r8,4),%xmm9 ?;*faload {reexecute=0 rethrow=0 return_oop=0} > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 (line 41) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 > (line 103) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > >? ? ?(line 118) > >? ? ? > > >? ? ? >? ? 0x00000254b0d4bf64: cmp ? ?%ebx,%r8d > >? ? ? >? ? 0x00000254b0d4bf67: jae ? ?0x00000254b0d4c1ec > >? ? ? >? ? 0x00000254b0d4bf6d: vaddss 0x10(%rdi,%r8,4),%xmm9,%xmm9 > >? ? ? >? ? 0x00000254b0d4bf74: vmovss %xmm9,0x10(%rcx,%r8,4) ?;*fastore {reexecute=0 rethrow=0 return_oop=0} > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 (line 41) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 > (line 103) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > >? ? ?(line 118) > >? ? ? > > >? ? ? >? ? 0x00000254b0d4bf7b: inc ? ?%r8d ? ? ? ? ? ? ? ;*iinc {reexecute=0 rethrow=0 return_oop=0} > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 (line 40) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 > (line 103) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > >? ? ?(line 118) > >? ? ? > > >? ? ? >? ? 0x00000254b0d4bf7e: cmp ? ?%r9d,%r8d > >? ? ? >? ? 0x00000254b0d4bf81: jl ? ? 0x00000254b0d4bf54 ?;*goto {reexecute=0 rethrow=0 return_oop=0} > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 (line 40) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 > (line 103) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) > >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > >? ? ?(line 118) > >? ? ? > > >? ? ? > > >? ? ? > > >? ? ? > Here is the link to the full log, should you want to take a look at it: > >? ? ? > https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing > > > > ?> > > > >? ? ? > > > > ? > > > ?>> > >? ? ? > > >? ? ? > Best regards, > >? ? ? > Nicolas Heutte > >? ? ? > > >? ? ? > On Thu, Feb 11, 2021 at 7:05 PM Vladimir Kozlov > > >? ? ? >>> wrote: > >? ? ? > > >? ? ? >? ? ?Changing wide mailing list to JIT compiler only. > >? ? ? > > >? ? ? >? ? ?This deoptimization is normal in Tiered Compilation - it switched from profiling code (level='3') > generated by C1 > >? ? ? >? ? ?compiler to new code generated by C2 (level='4') which does loop optimizations. > >? ? ? > > >? ? ? >? ? ?Thank you for posting inlining information: > >? ? ? > > >? ? ? >? ? ? ? ? ?@ 17? ?com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 bytes) inline (hot) > >? ? ? >? ? ? ? ? ? ? \-> TypeProfile (14054/14054 counts) = > com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > >? ? ? > > >? ? ? >? ? ?I thought before that may be call site is not hot but it is not the case. > >? ? ? > > >? ? ? >? ? ?You can do an other experiment to collect log with disabled Tiered Compilation (only C2 is used): > >? ? ?-XX:-TieredCompilation > >? ? ? >? ? ?Also print assembler code (as you did before) for final compilation to see if loop is still not > vectorized. > >? ? ? > > >? ? ? >? ? ?Is it possible to post log file (on GitHub?) for me to look? > >? ? ? > > >? ? ? >? ? ?Thanks, > >? ? ? >? ? ?Vladimir K > >? ? ? > > >? ? ? >? ? ?On 2/11/21 6:28 AM, Nicolas Heutte wrote: > >? ? ? >? ? ? > Hi?Vladimir, > >? ? ? >? ? ? > > >? ? ? >? ? ? > Thank you for your help. > >? ? ? >? ? ? > > >? ? ? >? ? ? > I'm currently running Java 11.0.9, and I did not use any VM flag of note. > >? ? ? >? ? ? > > >? ? ? >? ? ? > I checked the content of the compilation log, and it seems that > >? ? ?ArrayFloatToArrayFloatVectorBinding::plus() was > >? ? ? >? ? ? > deoptimized in order to allow AVector::plus() to be compiled: > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > >? ? ?bytes='23' > >? ? ? >? ? ? > count='916' iicount='916' level='3' stamp='7394.056' comment='tiered' hot_count='896'/> > >? ? ? >? ? ? > > >? ? ? >? ? ? > compiler='c1' > >? ? ?level='3'> > >? ? ? >? ? ? > >? ? ?iicount='910'/> > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > The last compilation entry for AVector::plus() is: > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > >? ? ? >? ? ?address='0x00000296d6af3110' > >? ? ? >? ? ? > relocation_offset='376' insts_offset='432' stub_offset='1040' scopes_data_offset='1152' > >? ? ?scopes_pcs_offset='1592' > >? ? ? >? ? ? > dependencies_offset='1880' nul_chk_table_offset='1896' oops_offset='1064' metadata_offset='1072' > >? ? ? >? ? ? > method='com.qfs.vector.impl.AVector plus (Lcom/qfs/vector/IVector;)V' bytes='23' count='172425' > >? ? ?iicount='172425' > >? ? ? >? ? ? > stamp='7394.199'/> > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 1 ? com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 > bytes) > >? ? ?inline > >? ? ? >? ? ?(hot) > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14552/14552 counts) = > >? ? ?com/qfs/vector/array/impl/ArrayFloatVector > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 7 ? com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 > bytes) > >? ? ?inline > >? ? ? >? ? ?(hot) > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14150/14150 counts) = > >? ? ?com/qfs/vector/array/impl/ArrayFloatVector > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 10 ? com.qfs.vector.binding.impl.VectorBindings::getBinding (9 bytes) > >? ? ?inline (hot) > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 5 > >? ? ? >? ? ?com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::getBinding (22 > >? ? ? >? ? ? > bytes) ? inline (hot) > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 3 > >? ? ? >? ? ?com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::hasBinding > >? ? ? >? ? ? > (34 bytes) ? inline (hot) > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 17 > >? ? ?com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 > >? ? ? >? ? ?bytes) > >? ? ? >? ? ? > inline (hot) > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14054/14054 counts) = > >? ? ? >? ? ? > com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 12 ? com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) > >? ? ?inline (hot) > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 22 ? com.qfs.vector.impl.AVector::checkIndex (37 bytes) ? inline > (hot) > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 6 ? com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) > >? ? ?inline (hot) > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 27 ? com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying > (5 bytes) > >? ? ? >? ? ?accessor > >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 34 ? com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying > (5 bytes) > >? ? ? >? ? ?accessor > >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? >? ? ? > Unfortunately, I do not have access to a debug VM build, so I cannot run the second test you recommend. > >? ? ? >? ? ? > > >? ? ? >? ? ? > Best regards, > >? ? ? >? ? ? > Nicolas Heutte > >? ? ? >? ? ? > > >? ? ? >? ? ? > On Wed, Feb 10, 2021 at 7:36 PM Vladimir Kozlov > >? ? ?> >> > >? ? ? >? ? ? > > > >? ? ?>>>> wrote: > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Hi, Nicolas > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Looks like, when inlined, the loop from ArrayFloatToArrayFloatVectorBinding::plus() was not > optimized > >? ? ?at all: > >? ? ? >? ? ?it is not > >? ? ? >? ? ? >? ? ?unrolled and has range checks. Such loops are not vectorized (you need unrolling and no checks). > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?What Java version you are running? What HotSpot VM flags you are using when running application? > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Run application with -XX:+LogCompilation and look on compilation data in hotspot_pid.log > file for > >? ? ?caller > >? ? ? >? ? ? >? ? ?AVector::plus(). > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?VM also has several flags to trace loop optimizations but they are only available in debug VM > build. > >? ? ?If you > >? ? ? >? ? ?have access > >? ? ? >? ? ? >? ? ?to such build run with -XX:+PrintCompilation -XX:+TraceLoopOpts flags. > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?Thanks, > >? ? ? >? ? ? >? ? ?Vladimir K > >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ?On 2/10/21 9:24 AM, Nicolas Heutte wrote: > >? ? ? >? ? ? >? ? ? > Hi all, > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > I am encountering a performance issue caused by the interaction between > >? ? ? >? ? ? >? ? ? > method inlining and automatic vectorization. > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > Our application aggregates arrays intensively using a method named > >? ? ? >? ? ? >? ? ? > ArrayFloatToArrayFloatVectorBinding.plus() with the following code: > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? for (int i = 0; i < srcLen; ++i) { > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? dstArray[i] += srcArray[i]; > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? } > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > When we microbenchmark this method we observe fast performance close to the > >? ? ? >? ? ? >? ? ? > practical memory bandwidth and when we print the assembly code we observe > >? ? ? >? ? ? >? ? ? > loop unrolling and automatic vectorization with SIMD instructions. > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600abf0: vmovdqu 0x10(%r14,%r13,4),%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600abf7: vaddps 0x10(%rcx,%r13,4),%ymm0,%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600abfe: vmovdqu %ymm0,0x10(%r14,%r13,4) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac05: movslq %r13d,%r11 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac08: vmovdqu 0x30(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac0f: vaddps 0x30(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac16: vmovdqu %ymm0,0x30(%r14,%r11,4) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac1d: vmovdqu 0x50(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac24: vaddps 0x50(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac2b: vmovdqu %ymm0,0x50(%r14,%r11,4) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac32: vmovdqu 0x70(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac39: vaddps 0x70(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac40: vmovdqu %ymm0,0x70(%r14,%r11,4) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac47: vmovdqu 0x90(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac51: vaddps 0x90(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac5b: vmovdqu %ymm0,0x90(%r14,%r11,4) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac65: vmovdqu 0xb0(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac6f: vaddps 0xb0(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac79: vmovdqu %ymm0,0xb0(%r14,%r11,4) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac83: vmovdqu 0xd0(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac8d: vaddps 0xd0(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac97: vmovdqu %ymm0,0xd0(%r14,%r11,4) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600aca1: vmovdqu 0xf0(%r14,%r11,4),%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600acab: vaddps 0xf0(%rcx,%r11,4),%ymm0,%ymm0 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600acb5: vmovdqu %ymm0,0xf0(%r14,%r11,4)? ;*fastore > >? ? ? >? ? ? >? ? ? > {reexecute=0 rethrow=0 return_oop=0} > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > >? ? ? >? ? ? >? ? ? > (line 41) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600acbf: add? ? $0x40,%r13d? ? ? ? ;*iinc {reexecute=0 > >? ? ? >? ? ? >? ? ? > rethrow=0 return_oop=0} > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > >? ? ? >? ? ? >? ? ? > (line 40) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600acc3: cmp? ? %eax,%r13d > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600acc6: jl? ? ?0x000001ef4600abf0? ;*goto {reexecute=0 > >? ? ? >? ? ? >? ? ? > rethrow=0 return_oop=0} > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > >? ? ? >? ? ? >? ? ? > (line 40) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > In the real application, this method is actually inlined in a higher level > >? ? ? >? ? ? >? ? ? > method named AVector.plus(). Unfortunately, the inlined version of the > >? ? ? >? ? ? >? ? ? > aggregation code is not vectorized anymore: > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180a0: cmp? ? %ebx,%r11d > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180a3: jae? ? 0x000001ef460180e6 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180a5: vmovss 0x10(%r8,%r11,4),%xmm1? ;*faload {reexecute=0 > >? ? ? >? ? ? >? ? ? > rethrow=0 return_oop=0} > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 > >? ? ? >? ? ? >? ? ? > (line 41) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180ac: cmp? ? %ecx,%r11d > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180af: jae? ? 0x000001ef46018104 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180b1: vaddss 0x10(%r9,%r11,4),%xmm1,%xmm1 > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180b8: vmovss %xmm1,0x10(%r8,%r11,4)? ;*fastore {reexecute=0 > >? ? ? >? ? ? >? ? ? > rethrow=0 return_oop=0} > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > >? ? ? >? ? ? >? ? ? > (line 41) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180bf: inc? ? %r11d? ? ? ? ? ? ? ;*iinc {reexecute=0 > >? ? ? >? ? ? >? ? ? > rethrow=0 return_oop=0} > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > >? ? ? >? ? ? >? ? ? > (line 40) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180c2: cmp? ? %r10d,%r11d > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180c5: jl? ? ?0x000001ef460180a0? ;*goto {reexecute=0 > >? ? ? >? ? ? >? ? ? > rethrow=0 return_oop=0} > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > >? ? ? >? ? ? >? ? ? > (line 40) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - > >? ? ? >? ? ? >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > This causes a significant performance drop, compared to a run where we > >? ? ? >? ? ? >? ? ? > explicitly disable the inlining and observe automatically vectorized code > >? ? ? >? ? ? >? ? ? > again ( > >? ? ? >? ? ? >? ? ? > > -XX:CompileCommand=dontinline,com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding.plus > >? ? ? >? ? ? >? ? ? > ). > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > How do you guys explain that behavior of the JIT compiler? Is this a known > >? ? ? >? ? ? >? ? ? > and tracked issue, could it be fixed in the JVM? Can we do something in the > >? ? ? >? ? ? >? ? ? > java code to prevent this from happening? > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > Best regards, > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? >? ? ? > Nicolas Heutte > >? ? ? >? ? ? >? ? ? > > >? ? ? >? ? ? > > >? ? ? > > > > From vladimir.kozlov at oracle.com Sat Feb 20 01:38:39 2021 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 19 Feb 2021 17:38:39 -0800 Subject: [External] : Re: SuperWord loop optimization lost after method inlining In-Reply-To: References: <87dcb670-622f-6b05-7c06-289f9b4f3634@oracle.com> <08ce2121-bbb4-3ea5-8f05-efeb33df7b74@oracle.com> <09a53249-00de-91c1-d88b-6c816ce6dd14@oracle.com> Message-ID: BTW, I filed bug to collect information: https://bugs.openjdk.java.net/browse/JDK-8262067 This is very weird case which I can't reproduce with small test. It reminds me one case (Loop did not transform into Counted loop) which was fixed in JDK 11.0.3: https://bugs.openjdk.java.net/browse/JDK-8211451 Thanks, Vladimir K On 2/19/21 12:06 PM, Vladimir Kozlov wrote: > I need an other file C:\Users\NicolasHeutte\AppData\Local\Temp\\hs_c10212_pid15016.log created from second C2 compiler > thread. It should have data for standalone ArrayFloatToArrayFloatVectorBinding::plus() method compilation. To see what > is going on I have to compare these data. > > Thanks, > Vladimir K > > On 2/19/21 8:54 AM, Nicolas Heutte wrote: >> Hello Vladimir, >> >> I've added the requested log to the shared folder >> (https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing >> ). >> I've also tried disabling the strip mining optimization as you suggested, but there was no significant performance >> change. >> >> Best regards, >> Nicolas Heutte >> >> On Wed, Feb 17, 2021 at 9:05 PM Vladimir Kozlov > wrote: >> >> ??? Unfortunately it is still not the file I am looking for. >> >> ??? First, remove -XX:+PrintAssembly flag from command line. I have already files with assembler code. >> >> ??? Second, I see link to the file I am looking for: >> ??? >> >> ??? If you still have it, please send it. If application stopped before normal exit that file is not merged into >> ??? hotspot_pid.log file. >> >> ??? If you don't have it - do an other run with -XX:CICompilerCount=1 to use only one C2 compiler thread with Tiered >> ??? off. It >> ??? will simplify ordering of log. >> >> ??? You can also do an other experiment without collecting log. Run app with next flags to disable loop strip minning >> ??? optimization:? -XX:-UseCountedLoopSafepoints -XX:LoopStripMiningIter=0 >> >> ??? Thanks, >> ??? Vladimir K >> >> ??? On 2/17/21 2:34 AM, Nicolas Heutte wrote: >> ???? > Hi Vladimir, >> ???? > >> ???? > I have rerun the test with the appropriate options, the obtained logs are in this folder: >> ???? > https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing >> >> >> >> >> ???? > >> >> > >> >> > >> >> ???? > >> ???? > Best regards, >> ???? > Nicolas Heutte >> ???? > >> ???? > On Tue, Feb 16, 2021 at 11:35 PM Vladimir Kozlov >> ??? >> wrote: >> ???? > >> ???? >? ? ?Hi Nicolas, >> ???? > >> ???? >? ? ?The file you shared has only assembler code. Yes, it shows that when >> ??? ArrayFloatToArrayFloatVectorBinding::plus() is >> ???? >? ? ?inlined into AVector::plus() it is not vectorized. >> ???? > >> ???? >? ? ?But I asked for an other file (hotspot_pid.log) which is generated when you run app with >> ???? >? ? ?-XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation flags. It should start with: >> ???? > >> ???? >? ? ? >> ???? >? ? ? >> ???? >? ? ? >> ???? >? ? ? >> ???? >? ? ?Java HotSpot(TM) 64-Bit Server VM >> ???? >? ? ? >> ???? >? ? ? >> ???? >? ? ?11.0.9+7-LTS >> ???? >? ? ? >> ???? > >> ???? >? ? ?Thanks, >> ???? >? ? ?Vladimir K >> ???? > >> ???? >? ? ?On 2/15/21 5:19 AM, Nicolas Heutte wrote: >> ???? >? ? ? > Hi Vladimir, >> ???? >? ? ? > >> ???? >? ? ? > I've tried disabling tiered compilation, as you requested. It seems that the inlining was performed >> slightly >> ???? >? ? ? > differently, but the issue remains. As you can see in this excerpt, the main loop isn't properly >> vectorized: >> ???? >? ? ? > >> ???? >? ? ? >? ? 0x00000254b0d4bf54: cmp ? ?%r11d,%r8d >> ???? >? ? ? >? ? 0x00000254b0d4bf57: jae ? ?0x00000254b0d4c19e >> ???? >? ? ? >? ? 0x00000254b0d4bf5d: vmovss 0x10(%rcx,%r8,4),%xmm9 ?;*faload {reexecute=0 rethrow=0 return_oop=0} >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 (line 41) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 >> ??? (line 103) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ??? com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 >> ???? >? ? ?(line 118) >> ???? >? ? ? > >> ???? >? ? ? >? ? 0x00000254b0d4bf64: cmp ? ?%ebx,%r8d >> ???? >? ? ? >? ? 0x00000254b0d4bf67: jae ? ?0x00000254b0d4c1ec >> ???? >? ? ? >? ? 0x00000254b0d4bf6d: vaddss 0x10(%rdi,%r8,4),%xmm9,%xmm9 >> ???? >? ? ? >? ? 0x00000254b0d4bf74: vmovss %xmm9,0x10(%rcx,%r8,4) ?;*fastore {reexecute=0 rethrow=0 return_oop=0} >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 (line 41) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 >> ??? (line 103) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ??? com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 >> ???? >? ? ?(line 118) >> ???? >? ? ? > >> ???? >? ? ? >? ? 0x00000254b0d4bf7b: inc ? ?%r8d ? ? ? ? ? ? ? ;*iinc {reexecute=0 rethrow=0 return_oop=0} >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 (line 40) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 >> ??? (line 103) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ??? com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 >> ???? >? ? ?(line 118) >> ???? >? ? ? > >> ???? >? ? ? >? ? 0x00000254b0d4bf7e: cmp ? ?%r9d,%r8d >> ???? >? ? ? >? ? 0x00000254b0d4bf81: jl ? ? 0x00000254b0d4bf54 ?;*goto {reexecute=0 rethrow=0 return_oop=0} >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 (line 40) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 >> ??? (line 103) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 (line 66) >> ???? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ??? com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 >> ???? >? ? ?(line 118) >> ???? >? ? ? > >> ???? >? ? ? > >> ???? >? ? ? > >> ???? >? ? ? > Here is the link to the full log, should you want to take a look at it: >> ???? >? ? ? > https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing >> >> >> >> ???? > >> ?> > >> >> ???? > >> ???? >? ? ? > >> ???? > >> ?> >> >> ???? > >> ?> >> >> >> ???? >? ? ? > >> ???? >? ? ? > Best regards, >> ???? >? ? ? > Nicolas Heutte >> ???? >? ? ? > >> ???? >? ? ? > On Thu, Feb 11, 2021 at 7:05 PM Vladimir Kozlov > ??? > >> ???? >? ? ? > ??? >>> wrote: >> ???? >? ? ? > >> ???? >? ? ? >? ? ?Changing wide mailing list to JIT compiler only. >> ???? >? ? ? > >> ???? >? ? ? >? ? ?This deoptimization is normal in Tiered Compilation - it switched from profiling code (level='3') >> ??? generated by C1 >> ???? >? ? ? >? ? ?compiler to new code generated by C2 (level='4') which does loop optimizations. >> ???? >? ? ? > >> ???? >? ? ? >? ? ?Thank you for posting inlining information: >> ???? >? ? ? > >> ???? >? ? ? >? ? ? ? ? ?@ 17? ?com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 bytes) inline >> (hot) >> ???? >? ? ? >? ? ? ? ? ? ? \-> TypeProfile (14054/14054 counts) = >> ??? com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding >> ???? >? ? ? > >> ???? >? ? ? >? ? ?I thought before that may be call site is not hot but it is not the case. >> ???? >? ? ? > >> ???? >? ? ? >? ? ?You can do an other experiment to collect log with disabled Tiered Compilation (only C2 is used): >> ???? >? ? ?-XX:-TieredCompilation >> ???? >? ? ? >? ? ?Also print assembler code (as you did before) for final compilation to see if loop is still not >> ??? vectorized. >> ???? >? ? ? > >> ???? >? ? ? >? ? ?Is it possible to post log file (on GitHub?) for me to look? >> ???? >? ? ? > >> ???? >? ? ? >? ? ?Thanks, >> ???? >? ? ? >? ? ?Vladimir K >> ???? >? ? ? > >> ???? >? ? ? >? ? ?On 2/11/21 6:28 AM, Nicolas Heutte wrote: >> ???? >? ? ? >? ? ? > Hi?Vladimir, >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > Thank you for your help. >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > I'm currently running Java 11.0.9, and I did not use any VM flag of note. >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > I checked the content of the compilation log, and it seems that >> ???? >? ? ?ArrayFloatToArrayFloatVectorBinding::plus() was >> ???? >? ? ? >? ? ? > deoptimized in order to allow AVector::plus() to be compiled: >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > > ???? >? ? ?bytes='23' >> ???? >? ? ? >? ? ? > count='916' iicount='916' level='3' stamp='7394.056' comment='tiered' hot_count='896'/> >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > > ??? compiler='c1' >> ???? >? ? ?level='3'> >> ???? >? ? ? >? ? ? > > ???? >? ? ?iicount='910'/> >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > The last compilation entry for AVector::plus() is: >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > > ???? >? ? ? >? ? ?address='0x00000296d6af3110' >> ???? >? ? ? >? ? ? > relocation_offset='376' insts_offset='432' stub_offset='1040' scopes_data_offset='1152' >> ???? >? ? ?scopes_pcs_offset='1592' >> ???? >? ? ? >? ? ? > dependencies_offset='1880' nul_chk_table_offset='1896' oops_offset='1064' metadata_offset='1072' >> ???? >? ? ? >? ? ? > method='com.qfs.vector.impl.AVector plus (Lcom/qfs/vector/IVector;)V' bytes='23' count='172425' >> ???? >? ? ?iicount='172425' >> ???? >? ? ? >? ? ? > stamp='7394.199'/> >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 1 ? com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 >> ??? bytes) >> ???? >? ? ?inline >> ???? >? ? ? >? ? ?(hot) >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14552/14552 counts) = >> ???? >? ? ?com/qfs/vector/array/impl/ArrayFloatVector >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 7 ? com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 >> ??? bytes) >> ???? >? ? ?inline >> ???? >? ? ? >? ? ?(hot) >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14150/14150 counts) = >> ???? >? ? ?com/qfs/vector/array/impl/ArrayFloatVector >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 10 ? com.qfs.vector.binding.impl.VectorBindings::getBinding (9 >> bytes) >> ???? >? ? ?inline (hot) >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 5 >> ???? >? ? ? >? ? ?com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::getBinding (22 >> ???? >? ? ? >? ? ? > bytes) ? inline (hot) >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 3 >> ???? >? ? ? >? ? ?com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::hasBinding >> ???? >? ? ? >? ? ? > (34 bytes) ? inline (hot) >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 17 >> ???? >? ? ?com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 >> ???? >? ? ? >? ? ?bytes) >> ???? >? ? ? >? ? ? > inline (hot) >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?\-> TypeProfile (14054/14054 counts) = >> ???? >? ? ? >? ? ? > com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 12 ? com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) >> ???? >? ? ?inline (hot) >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 22 ? com.qfs.vector.impl.AVector::checkIndex (37 bytes) ? inline >> ??? (hot) >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 6 ? com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) >> ???? >? ? ?inline (hot) >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 27 ? com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying >> ??? (5 bytes) >> ???? >? ? ? >? ? ?accessor >> ???? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? @ 34 ? com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying >> ??? (5 bytes) >> ???? >? ? ? >? ? ?accessor >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > Unfortunately, I do not have access to a debug VM build, so I cannot run the second test you >> recommend. >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > Best regards, >> ???? >? ? ? >? ? ? > Nicolas Heutte >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > On Wed, Feb 10, 2021 at 7:36 PM Vladimir Kozlov > ??? >> ???? >? ? ?> > ??? >> >> ???? >? ? ? >? ? ? >> ??? > > ??? >> ???? >? ? ?>>>> wrote: >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ?Hi, Nicolas >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ?Looks like, when inlined, the loop from ArrayFloatToArrayFloatVectorBinding::plus() was not >> ??? optimized >> ???? >? ? ?at all: >> ???? >? ? ? >? ? ?it is not >> ???? >? ? ? >? ? ? >? ? ?unrolled and has range checks. Such loops are not vectorized (you need unrolling and no checks). >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ?What Java version you are running? What HotSpot VM flags you are using when running application? >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ?Run application with -XX:+LogCompilation and look on compilation data in hotspot_pid.log >> ??? file for >> ???? >? ? ?caller >> ???? >? ? ? >? ? ? >? ? ?AVector::plus(). >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ?VM also has several flags to trace loop optimizations but they are only available in debug VM >> ??? build. >> ???? >? ? ?If you >> ???? >? ? ? >? ? ?have access >> ???? >? ? ? >? ? ? >? ? ?to such build run with -XX:+PrintCompilation -XX:+TraceLoopOpts flags. >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ?Thanks, >> ???? >? ? ? >? ? ? >? ? ?Vladimir K >> ???? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ?On 2/10/21 9:24 AM, Nicolas Heutte wrote: >> ???? >? ? ? >? ? ? >? ? ? > Hi all, >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > I am encountering a performance issue caused by the interaction between >> ???? >? ? ? >? ? ? >? ? ? > method inlining and automatic vectorization. >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > Our application aggregates arrays intensively using a method named >> ???? >? ? ? >? ? ? >? ? ? > ArrayFloatToArrayFloatVectorBinding.plus() with the following code: >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? for (int i = 0; i < srcLen; ++i) { >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? dstArray[i] += srcArray[i]; >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? } >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > When we microbenchmark this method we observe fast performance close to the >> ???? >? ? ? >? ? ? >? ? ? > practical memory bandwidth and when we print the assembly code we observe >> ???? >? ? ? >? ? ? >? ? ? > loop unrolling and automatic vectorization with SIMD instructions. >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600abf0: vmovdqu 0x10(%r14,%r13,4),%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600abf7: vaddps 0x10(%rcx,%r13,4),%ymm0,%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600abfe: vmovdqu %ymm0,0x10(%r14,%r13,4) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac05: movslq %r13d,%r11 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac08: vmovdqu 0x30(%r14,%r11,4),%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac0f: vaddps 0x30(%rcx,%r11,4),%ymm0,%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac16: vmovdqu %ymm0,0x30(%r14,%r11,4) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac1d: vmovdqu 0x50(%r14,%r11,4),%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac24: vaddps 0x50(%rcx,%r11,4),%ymm0,%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac2b: vmovdqu %ymm0,0x50(%r14,%r11,4) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac32: vmovdqu 0x70(%r14,%r11,4),%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac39: vaddps 0x70(%rcx,%r11,4),%ymm0,%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac40: vmovdqu %ymm0,0x70(%r14,%r11,4) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac47: vmovdqu 0x90(%r14,%r11,4),%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac51: vaddps 0x90(%rcx,%r11,4),%ymm0,%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac5b: vmovdqu %ymm0,0x90(%r14,%r11,4) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac65: vmovdqu 0xb0(%r14,%r11,4),%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac6f: vaddps 0xb0(%rcx,%r11,4),%ymm0,%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac79: vmovdqu %ymm0,0xb0(%r14,%r11,4) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac83: vmovdqu 0xd0(%r14,%r11,4),%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac8d: vaddps 0xd0(%rcx,%r11,4),%ymm0,%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600ac97: vmovdqu %ymm0,0xd0(%r14,%r11,4) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600aca1: vmovdqu 0xf0(%r14,%r11,4),%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600acab: vaddps 0xf0(%rcx,%r11,4),%ymm0,%ymm0 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600acb5: vmovdqu %ymm0,0xf0(%r14,%r11,4)? ;*fastore >> ???? >? ? ? >? ? ? >? ? ? > {reexecute=0 rethrow=0 return_oop=0} >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 >> ???? >? ? ? >? ? ? >? ? ? > (line 41) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600acbf: add? ? $0x40,%r13d? ? ? ? ;*iinc {reexecute=0 >> ???? >? ? ? >? ? ? >? ? ? > rethrow=0 return_oop=0} >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 >> ???? >? ? ? >? ? ? >? ? ? > (line 40) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600acc3: cmp? ? %eax,%r13d >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef4600acc6: jl? ? ?0x000001ef4600abf0? ;*goto {reexecute=0 >> ???? >? ? ? >? ? ? >? ? ? > rethrow=0 return_oop=0} >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 >> ???? >? ? ? >? ? ? >? ? ? > (line 40) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > In the real application, this method is actually inlined in a higher level >> ???? >? ? ? >? ? ? >? ? ? > method named AVector.plus(). Unfortunately, the inlined version of the >> ???? >? ? ? >? ? ? >? ? ? > aggregation code is not vectorized anymore: >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180a0: cmp? ? %ebx,%r11d >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180a3: jae? ? 0x000001ef460180e6 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180a5: vmovss 0x10(%r8,%r11,4),%xmm1? ;*faload {reexecute=0 >> ???? >? ? ? >? ? ? >? ? ? > rethrow=0 return_oop=0} >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 >> ???? >? ? ? >? ? ? >? ? ? > (line 41) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? >? ? ? >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180ac: cmp? ? %ecx,%r11d >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180af: jae? ? 0x000001ef46018104 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180b1: vaddss 0x10(%r9,%r11,4),%xmm1,%xmm1 >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180b8: vmovss %xmm1,0x10(%r8,%r11,4)? ;*fastore {reexecute=0 >> ???? >? ? ? >? ? ? >? ? ? > rethrow=0 return_oop=0} >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 >> ???? >? ? ? >? ? ? >? ? ? > (line 41) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? >? ? ? >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180bf: inc? ? %r11d? ? ? ? ? ? ? ;*iinc {reexecute=0 >> ???? >? ? ? >? ? ? >? ? ? > rethrow=0 return_oop=0} >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 >> ???? >? ? ? >? ? ? >? ? ? > (line 40) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? >? ? ? >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180c2: cmp? ? %r10d,%r11d >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? 0x000001ef460180c5: jl? ? ?0x000001ef460180a0? ;*goto {reexecute=0 >> ???? >? ? ? >? ? ? >? ? ? > rethrow=0 return_oop=0} >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? >? ? ? >? ? ? > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 >> ???? >? ? ? >? ? ? >? ? ? > (line 40) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ; - >> ???? >? ? ? >? ? ? >? ? ? > com.qfs.vector.impl.AVector::plus at 17 (line 204) >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > This causes a significant performance drop, compared to a run where we >> ???? >? ? ? >? ? ? >? ? ? > explicitly disable the inlining and observe automatically vectorized code >> ???? >? ? ? >? ? ? >? ? ? > again ( >> ???? >? ? ? >? ? ? >? ? ? > >> ??? -XX:CompileCommand=dontinline,com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding.plus >> ???? >? ? ? >? ? ? >? ? ? > ). >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > How do you guys explain that behavior of the JIT compiler? Is this a known >> ???? >? ? ? >? ? ? >? ? ? > and tracked issue, could it be fixed in the JVM? Can we do something in the >> ???? >? ? ? >? ? ? >? ? ? > java code to prevent this from happening? >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > Best regards, >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? >? ? ? > Nicolas Heutte >> ???? >? ? ? >? ? ? >? ? ? > >> ???? >? ? ? >? ? ? > >> ???? >? ? ? > >> ???? > >> From cjashfor at linux.ibm.com Sat Feb 20 02:31:37 2021 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Fri, 19 Feb 2021 18:31:37 -0800 Subject: Question about RegMask::is_aligned_sets() Message-ID: <2269ddde-8084-3d97-a8f9-1996a2937622@linux.ibm.com> Hello all, First the question: I'd like to understand the concept of "aligned sets" in RegMask. I believe I understand the RegMask idea overall, but I don't understand the idea of alignment of sets (actually the concept of sets in this context is also fuzzy). I've looked at the code that implements is_aligned_sets, and I just can't yet seem to grok what requirement it is trying to verify. I read RegMask.hpp's comments on the method protoype, and it didn't help me much, I'm afraid. If someone could give a paragraph or two of explanation, I'd really appreciate it. Any additional insights into porting the Vector API to other arches would also be appreciated. For example, maybe we've started the port at the wrong place. Thanks for your consideration. More background: We have started working on adding support to the PPC64-LE hotspot code for the Vector API. In order to support Vector Masks, it seems we need to change our current support for fixed-length, 128-bit vectors to something that can be as short as two booleans. To do that we have changed the function min_vector_size in hotspot/cpu/ppc.ad to return 2 when the type is T_BOOLEAN, otherwise it still returns 16. My first task was to add support for vector masks, and so I added a new instruct to cpu/ppc/ppc.ad to match VectorLoadMask, which then necessitated adding some instructs for LoadVector and StoreVector of the appropriate lengths. I have a test case that loads a vector mask for a vector of shorts: import jdk.incubator.vector.ShortVector; import jdk.incubator.vector.VectorSpecies; import jdk.incubator.vector.VectorMask; import java.util.Random; class TestVectorMaskShort { private static final VectorSpecies SPECIES = ShortVector.SPECIES_128; public static VectorMask test(boolean[] bary) { VectorMask vmask = VectorMask.fromArray(SPECIES, bary, 0); return vmask; } public static void main(String args[]) { Random ran = new Random(100); int counter = 0; boolean[] bary = new boolean[8]; for (int i = 0; i < 20_000; i++) { for (int j = 0; j < bary.length; j++) { bary[j] = ran.nextBoolean(); } VectorMask vmask = test(bary); if (vmask.allTrue()) { counter++; } } System.out.printf("counter = %d\n", counter); } } When I run this test case, I get a runtime error: # Internal Error (/home/cjashfor/git-trees/jdk/src/hotspot/share/opto/chaitin.cpp:951), pid=1341588, tid=1341601 # assert(lrgmask.is_aligned_sets(RegMask::SlotsPerVecX)) failed: vector should be aligned - Corey Corey Ashford Software Engineer IBM Systems, LTC OpenJDK team IBM From github.com+10482586+therealeliu at openjdk.java.net Sat Feb 20 06:25:39 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Sat, 20 Feb 2021 06:25:39 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v3] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 02:00:04 GMT, Eric Liu wrote: >> OK. For what it's worth, I doubt that this will be suitable for backporting to 8u or 11u. > >> OK. For what it's worth, I doubt that this will be suitable for backporting to 8u or 11u. > > Thanks for your review, I'd like to backport this. > > BTW, Is there anyone could review those trivial shared code? @TobiHartmann Could you help to review those shared code? ------------- PR: https://git.openjdk.java.net/jdk/pull/1858 From dongbo at openjdk.java.net Sat Feb 20 06:26:13 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Sat, 20 Feb 2021 06:26:13 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v8] In-Reply-To: References: Message-ID: <5Y24E2lvmpeh6Ke9LT-S74vUT2bf1-wE8AfRPyunycs=.8740560a-d8b1-4f25-a0c6-ad0117aa6aff@github.com> > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with two additional commits since the last revision: - fix trailing whitespace - split ssra/usra tests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/1aba5629..ba8dc5ac Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=06-07 Stats: 469 lines in 3 files changed: 352 ins; 112 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Sat Feb 20 06:29:40 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Sat, 20 Feb 2021 06:29:40 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v7] In-Reply-To: <1zpEXArPUG-5mAnHqc7E-YBJ_whPIr0KWpSrgqV2mnQ=.d72ed309-938a-4d6a-9bbc-0a8f065c8411@github.com> References: <1zpEXArPUG-5mAnHqc7E-YBJ_whPIr0KWpSrgqV2mnQ=.d72ed309-938a-4d6a-9bbc-0a8f065c8411@github.com> Message-ID: On Fri, 19 Feb 2021 14:42:15 GMT, Andrew Haley wrote: >> Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 554: > >> 552: >> 553: WRAP(usra) WRAP(ssra) >> 554: #undef WRAP > > Are ssra and usra tested by anything? I don't seem them accessed in the test case. Updated. The `ssra/usra` are accessed by tests in `TestVectorShiftImmAndAccumulate.java`. Manually injected error by changing `addv` to `subv` if shifting right and accumulating with 0, the tests failed as expected. The `vba.add(vbb.lanewise(SHIFT, Imm))` pattern in `TestVectorShiftImmAndAccumulate.java` are actually the same with the original code in `TestVectorShiftImm.java`. As of now, I have no idea why `ssra/usra` are not accessed by the previous test code. The `vba.add(vbb.lanewise(SHIFT, Imm))` pattern should match `ssra/usra` anyway. I think we need a separate investigation. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From xliu at openjdk.java.net Sat Feb 20 08:30:00 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 20 Feb 2021 08:30:00 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v5] In-Reply-To: References: Message-ID: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set move tr_delete in StringUtils. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2178/files - new: https://git.openjdk.java.net/jdk/pull/2178/files/cfd51fb3..edbd13bd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=03-04 Stats: 119 lines in 7 files changed: 89 ins; 19 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/2178.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2178/head:pull/2178 PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Sat Feb 20 08:35:38 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 20 Feb 2021 08:35:38 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v4] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 19:23:14 GMT, Evgeny Astigeevich wrote: >> Another option: >> class filterStringStream: public stringStream { >> private: >> char ch; >> public: >> filterStringStream(char ch_to_filter, size_t initial_bufsize = 256) : stringStream(initial_bufsize), ch(ch_to_filter) {} >> >> virtual void write(const char* c, size_t len) override { >> const char* e = c + len; >> while (c != e) { >> size_t i = 0; >> while ((c+i) != e && c[i] != ch ) { >> ++i; >> } >> stringStream::write(c, i); >> c += i; >> while (c != e && *ch == ch) { >> ++c; >> } >> } >> } >> }; >> >> Your code will be: >> filterStringStream ss('\n'); >> ss.print(" "); >> const_oop->print_oop(&ss); >> st->print_raw(ss.base(), ss,size()); > >> `tr_delete` is expensive. Also deleting something in a stream does not fit into a concept of streams. >> I see that the content of `ss` is traversed many times. >> What about this code: >> >> ``` >> for (const char *str = ss.base(); *str; ) { >> size_t i = 0; >> while (str[i] && str[i] != '\n' ) { >> ++i; >> } >> st->print_raw(str, i); >> str += i; >> while (*str == '\n') { >> ++str; >> } >> } >> ``` > > You can put this code in a function like `print_filtering_ch(char, const stringStream&, outputStream*)` hi, @eastig , Thank you for reviewing this code. you are right, I shouldn't modify the contents of a stringStream. I treated it as a buffer instead of stream. I took you advice, I moved `tr_delete` logic to StringUtils, which is a toolkit class. Could you take a look again? ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Sun Feb 21 02:07:59 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sun, 21 Feb 2021 02:07:59 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v6] In-Reply-To: References: Message-ID: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set fix build failures on Windows. StringUtils::tr_delete returns size_of. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2178/files - new: https://git.openjdk.java.net/jdk/pull/2178/files/edbd13bd..077f9b60 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=04-05 Stats: 10 lines in 3 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2178.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2178/head:pull/2178 PR: https://git.openjdk.java.net/jdk/pull/2178 From jiefu at openjdk.java.net Sun Feb 21 23:24:54 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 21 Feb 2021 23:24:54 GMT Subject: RFR: 8262096: Vector API fails to work without C2 Message-ID: Hi all, Vector API won't work without C2. The reason is that VectorSupport_GetMaxLaneCount [1] always returns -1 if C2 is not present. But it should work even there is no JIT compiler since it's Java-level's api. So let's fix it. Thanks. Best regards, Jie [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 ------------- Commit messages: - 8262096: Vector API fails to work without C2 Changes: https://git.openjdk.java.net/jdk/pull/2667/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2667&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262096 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2667.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2667/head:pull/2667 PR: https://git.openjdk.java.net/jdk/pull/2667 From jiefu at openjdk.java.net Sun Feb 21 23:29:54 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 21 Feb 2021 23:29:54 GMT Subject: RFR: 8262097: Improve CompilerConfig ergonomics to fix a VM crash after JDK-8261229 Message-ID: Hi all, This bug was found while I was verifying the fix for JDK-8262096. It was exposed after JDK-8261229. Testing: - tier1~3 on Linux/x64, no regression Thanks. Best regards, Jie ------------- Commit messages: - 8262097: Improve CompilerConfig ergonomics to fix a VM crash after JDK-8261229 Changes: https://git.openjdk.java.net/jdk/pull/2668/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2668&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262097 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2668.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2668/head:pull/2668 PR: https://git.openjdk.java.net/jdk/pull/2668 From jiefu at openjdk.java.net Mon Feb 22 03:04:03 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 22 Feb 2021 03:04:03 GMT Subject: RFR: 8262096: Vector API fails to work without C2 [v2] In-Reply-To: References: Message-ID: > Hi all, > > Vector API won't work without C2. > The reason is that VectorSupport_GetMaxLaneCount [1] always returns -1 if C2 is not present. > But it should work even there is no JIT compiler since it's Java-level's api. > So let's fix it. > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Fix zero and minimal build failure ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2667/files - new: https://git.openjdk.java.net/jdk/pull/2667/files/0e0e03ab..059aa729 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2667&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2667&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2667.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2667/head:pull/2667 PR: https://git.openjdk.java.net/jdk/pull/2667 From xliu at openjdk.java.net Mon Feb 22 05:13:43 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 22 Feb 2021 05:13:43 GMT Subject: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object [v2] In-Reply-To: <5vIeI9zFvxAcG-dUY-8iQwXlXDyWJ3VFMUqKamIL4o4=.746707e1-9d41-4fc0-9162-1ac7ca5589f3@github.com> References: <3JNae6rXuxc_Q6YoALCH8Ku510Zne5ftqf1z8OCGkHQ=.2ebf5cd0-27e2-44e3-adf7-065179cc9ffd@github.com> <5vIeI9zFvxAcG-dUY-8iQwXlXDyWJ3VFMUqKamIL4o4=.746707e1-9d41-4fc0-9162-1ac7ca5589f3@github.com> Message-ID: On Wed, 17 Feb 2021 17:42:48 GMT, Volker Simonis wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 36 commits: >> >> - Merge branch 'master' into optimize_substring >> - fix regression for x86-32 >> >> if LP64 is off, the offset of AddP must be I instead of L. >> x86 also doesn't emit encodeP/storeN. it use storeP instead. >> - add a statistical counter for OptimizeTempArray. >> >> -XX:+PrintOptoStatistics shows it >> - [SIM-JVM-450] support deoptimization v2 >> >> because the src oop of scobj may be another scobj, deoptimization sort >> all objects in topological order. >> >> separate creation of dst oop and reassignment of it. >> - add a unit test for deoptimization >> - [SIM-JVM-450] support deoptimization part2 >> >> if OptimizeTempArray eliminates an AllocateArrayNode, scalar replacement will >> create a nested SafePointScalarObjectNode for the field value:byte[] of j.l.String. >> we use the nested sobj and an ObjectValue an envelope. it consists of 3 fields: >> 1. src 2. src_positio 3. length. >> >> deoptimizaton recognizes this ad-hoc ObjectValue and re-allocate an arrayOop >> for the String object. >> - enable OptimizeTempArray by default >> - Merge branch 'master' into optimize_substring >> - Revert "8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set" >> >> This reverts commit a49e34688d7d7c9d3c0d9c824d33f359613c2fc1. >> - Revert "add a new bucket afterea_late_inlines" >> >> afterea_late_inlines bucket is not useful. revert it and its relevant changes >> - ... and 26 more: https://git.openjdk.java.net/jdk/compare/849f4c0f...21693ddd > > src/hotspot/share/opto/macro.cpp line 1317: > >> 1315: // >> 1316: // >> 1317: // EncodeP: delele because we don't need storeN > > "delete" not "delele" got it. thanks. I will update it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2570 From roland at openjdk.java.net Mon Feb 22 09:08:43 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 22 Feb 2021 09:08:43 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v3] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 11:20:20 GMT, Eric Liu wrote: >> This patch transforms '(x >>> rshift) + (x << lshift)' into >> 'RotateRight(x, rshift)' during GVN phase when both the shift exponents >> are constants and their sum equals to the number of bits for the type >> of shift base. >> >> This patch implements some new match rules for AArch64 instructions >> which can take ROR as the optional shift. Such instructions are 'and', >> 'or', 'eor', 'eon', 'bic' and 'orn'. >> >> ror w11, w2, #5 >> eor w0, w1, w11 >> >> With this patch, above code could be optimized to below: >> >> eor w0, w1, w2, ror #5 >> >> Finally, the patch refactors TestRotate.java[1][2]. >> >> Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, >> jdk::jdk_core, langtools::tier1. >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8252776 >> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039911.html > > Eric Liu has updated the pull request incrementally with one additional commit since the last revision: > > Add benchmark test > > Change-Id: I63ca51d06070a07e5c20daf4b42d2c8d7237a1da Changes requested by roland (Reviewer). src/hotspot/share/opto/addnode.cpp line 349: > 347: } > 348: } > 349: Even though existing code in that method seems to assume node's inputs can't be NULL, it's a good practice to protect against unexpected NULLs as that can happen when sub-graphs die during IGVN. So in(1)->in(1), in(1)->in(2), in(2)->in(2) need to be tested for NULL. That logic and the one for AddLNode are almost identical. So it would be good to have it in a shared method. I've been adding helper methods to make that possible but not all of that code is in yet. Could you file a bug to revisit this issue later and assign it to me? ------------- PR: https://git.openjdk.java.net/jdk/pull/1858 From xliu at openjdk.java.net Mon Feb 22 10:19:54 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 22 Feb 2021 10:19:54 GMT Subject: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 17:46:54 GMT, Volker Simonis wrote: > What about GC? What happens if the original String isn't reachable any more? Do you put a reference to the byte array into the corresponding oop maps to make sure it can't be garbage collected? hi, Volker, I don't know OopMap very much. IMHO, it should be discovered by GC because I put the byte array oop to `jvm->locals` I do that for all safepoints nodes. // 1. src oop sfpt->add_req(ac->in(ArrayCopyNode::Src)); https://github.com/openjdk/jdk/pull/2570/files#diff-2faebd05d08f9115f8d9ef771644cf05087a6986c2f9013d7163c6aa720169c3R995 ------------- PR: https://git.openjdk.java.net/jdk/pull/2570 From github.com+10482586+therealeliu at openjdk.java.net Mon Feb 22 10:33:42 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Mon, 22 Feb 2021 10:33:42 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v3] In-Reply-To: References: Message-ID: <36Vj9yOC0pYOLdCa9ggd9Xg0WxY-O2psZMb86qwgqRI=.c158e853-6184-4790-af14-dc9dba2278cb@github.com> On Mon, 22 Feb 2021 09:05:53 GMT, Roland Westrelin wrote: >> Eric Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Add benchmark test >> >> Change-Id: I63ca51d06070a07e5c20daf4b42d2c8d7237a1da > > src/hotspot/share/opto/addnode.cpp line 349: > >> 347: } >> 348: } >> 349: > > Even though existing code in that method seems to assume node's inputs can't be NULL, it's a good practice to protect against unexpected NULLs as that can happen when sub-graphs die during IGVN. So in(1)->in(1), in(1)->in(2), in(2)->in(2) need to be tested for NULL. > > That logic and the one for AddLNode are almost identical. So it would be good to have it in a shared method. I've been adding helper methods to make that possible but not all of that code is in yet. Could you file a bug to revisit this issue later and assign it to me? Hi Roland, Thanks for your feedback. > Even though existing code in that method seems to assume node's inputs can't be NULL, it's a good practice to protect against unexpected NULLs as that can happen when sub-graphs die during IGVN. So in(1)->in(1), in(1)->in(2), in(2)->in(2) need to be tested for NULL. Agree > That logic and the one for AddLNode are almost identical. So it would be good to have it in a shared method. I've been adding helper methods to make that possible but not all of that code is in yet. As this match rule is trivial enough, how about withdrawing these shared code in this PR for integrating backend first if your helper methods coming soon? > Could you file a bug to revisit this issue later and assign it to me? Okay, it's on my queue now:P -- Eric ------------- PR: https://git.openjdk.java.net/jdk/pull/1858 From eirbjo at gmail.com Mon Feb 22 11:26:13 2021 From: eirbjo at gmail.com (=?UTF-8?B?RWlyaWsgQmrDuHJzbsO4cw==?=) Date: Mon, 22 Feb 2021 12:26:13 +0100 Subject: ThreadLocal lookup elimination Message-ID: Hello, ThreadLocals are commonly used to transfer state across method boundaries in situations where passing them as method parameters is impractical. Examples include crossing APIs you don't own, or instrumenting code you don't own. Consider the following pseudo-instrumented code (where the original code calls a getter inside a loop): public class Student { private int age; public int maxAge(Student[] students) { // Instrumented code: ExpensiveObject expensive = new ExpensiveObject(); expensive.recordSomething(); threadLocal.set(expensive); // Original code: int max = 0; for (Student student : students) { max = Math.max(max, student.getAge()); } return max; } public int getAge() { // Instrumented code ExpensiveObject exp = threadLocal.get(); exp.recordSomething(); // Original code: return age; } // Instrumented field: private static ThreadLocal threadLocal = new ThreadLocal<>(); } The ThreadLocal is used here to avoid constructing ExpensiveObject instances in each invocation of getAge. However, once a compiler worth its salt sees this code, it immediately wants to inline the getAge method: // Instrumented code: ExpensiveObject expensive = new ExpensiveObject(); expensive.recordSomething(); threadLocal.set(expensive); for (Student student : students) { // Instrumented code ExpensiveObject exp = threadLocal.get(); exp.recordSomething(); // Original code max = Math.max(max, student.age); } At this point, we see that the last write to threadLocal is 'expensive', so any following 'threadLocal.get()' should be substitutable for 'expensive'. So we could do the following instead: for (Student student : students) { // Instrumented code expensive.recordSomething(); // Original code max = Math.max(max, student.age); } More generally, a compiler could record the first lookup of a ThreadLocal in a scope and substitute any following lookup with the first read (until the next write). I'm pretty sure this would be immensely useful for my current use case (which instruments methods to count invocations), but perhaps it is also a useful optimization in a more general sense? Examples that come to mind are enterprise apps where transaction and security contexts are passed around using ThreadLocals. Has this type of optimization been discussed before? Is it even possible to implement, or did I miss some dragons hiding in the details? What would the estimated work for an implementation look like? Are we looking at bachelor's thesis? Master's thesis? PhD? Would love to hear some thoughts on this idea. Cheers, Eirik. From nhe at activeviam.com Mon Feb 22 13:12:10 2021 From: nhe at activeviam.com (Nicolas Heutte) Date: Mon, 22 Feb 2021 14:12:10 +0100 Subject: [External] : Re: SuperWord loop optimization lost after method inlining In-Reply-To: References: <87dcb670-622f-6b05-7c06-289f9b4f3634@oracle.com> <08ce2121-bbb4-3ea5-8f05-efeb33df7b74@oracle.com> <09a53249-00de-91c1-d88b-6c816ce6dd14@oracle.com> Message-ID: Hello Vladimir, Thanks for filing the bug. It's indeed a very surprising one. I've also added the missing file you needed to the shared folder. Best regards, Nicolas Heutte On Sat, Feb 20, 2021 at 2:38 AM Vladimir Kozlov wrote: > BTW, I filed bug to collect information: > > https://bugs.openjdk.java.net/browse/JDK-8262067 > > This is very weird case which I can't reproduce with small test. It > reminds me one case (Loop did not transform into > Counted loop) which was fixed in JDK 11.0.3: > > https://bugs.openjdk.java.net/browse/JDK-8211451 > > Thanks, > Vladimir K > > On 2/19/21 12:06 PM, Vladimir Kozlov wrote: > > I need an other file > C:\Users\NicolasHeutte\AppData\Local\Temp\\hs_c10212_pid15016.log created > from second C2 compiler > > thread. It should have data for standalone > ArrayFloatToArrayFloatVectorBinding::plus() method compilation. To see what > > is going on I have to compare these data. > > > > Thanks, > > Vladimir K > > > > On 2/19/21 8:54 AM, Nicolas Heutte wrote: > >> Hello Vladimir, > >> > >> I've added the requested log to the shared folder > >> ( > https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing > >> < > https://urldefense.com/v3/__https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing__;!!GqivPVa7Brio!IeK3TGC8Qf5eu0KH3LEkt26TEAWCirLIkTuJ2iAAmTkfBK4_Vnnr6gkOuOydizE_lFhl4g$>). > > >> I've also tried disabling the strip mining optimization as you > suggested, but there was no significant performance > >> change. > >> > >> Best regards, > >> Nicolas Heutte > >> > >> On Wed, Feb 17, 2021 at 9:05 PM Vladimir Kozlov < > vladimir.kozlov at oracle.com > wrote: > >> > >> Unfortunately it is still not the file I am looking for. > >> > >> First, remove -XX:+PrintAssembly flag from command line. I have > already files with assembler code. > >> > >> Second, I see link to the file I am looking for: > >> filename='C:\Users\NicolasHeutte\AppData\Local\Temp\\hs_c16812_pid15016.log'/> > >> > >> If you still have it, please send it. If application stopped before > normal exit that file is not merged into > >> hotspot_pid.log file. > >> > >> If you don't have it - do an other run with -XX:CICompilerCount=1 > to use only one C2 compiler thread with Tiered > >> off. It > >> will simplify ordering of log. > >> > >> You can also do an other experiment without collecting log. Run app > with next flags to disable loop strip minning > >> optimization: -XX:-UseCountedLoopSafepoints > -XX:LoopStripMiningIter=0 > >> > >> Thanks, > >> Vladimir K > >> > >> On 2/17/21 2:34 AM, Nicolas Heutte wrote: > >> > Hi Vladimir, > >> > > >> > I have rerun the test with the appropriate options, the obtained > logs are in this folder: > >> > > https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing > >> > >> < > https://urldefense.com/v3/__https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing__;!!GqivPVa7Brio!IeK3TGC8Qf5eu0KH3LEkt26TEAWCirLIkTuJ2iAAmTkfBK4_Vnnr6gkOuOydizE_lFhl4g$> > > >> > >> > >> > > >> > >> < > https://urldefense.com/v3/__https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing__;!!GqivPVa7Brio!PBl1ZdyC5xtmVS0QG3dxZxEen0D1LP-mBM0KnvmRVbQXpL_VPOQ9OD-pVGBqNvvSUuoKhQ$ > >> > >> > >> < > https://urldefense.com/v3/__https://drive.google.com/drive/folders/1UczOggtTYp6TZ0QnBiwMxwdTBl3zuvqF?usp=sharing__;!!GqivPVa7Brio!PBl1ZdyC5xtmVS0QG3dxZxEen0D1LP-mBM0KnvmRVbQXpL_VPOQ9OD-pVGBqNvvSUuoKhQ$>> > > >> > >> > > >> > Best regards, > >> > Nicolas Heutte > >> > > >> > On Tue, Feb 16, 2021 at 11:35 PM Vladimir Kozlov < > vladimir.kozlov at oracle.com > >> vladimir.kozlov at oracle.com>>> wrote: > >> > > >> > Hi Nicolas, > >> > > >> > The file you shared has only assembler code. Yes, it shows > that when > >> ArrayFloatToArrayFloatVectorBinding::plus() is > >> > inlined into AVector::plus() it is not vectorized. > >> > > >> > But I asked for an other file (hotspot_pid.log) which > is generated when you run app with > >> > -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation flags. It > should start with: > >> > > >> > > >> > time_ms='1613514688748'> > >> > > >> > > >> > Java HotSpot(TM) 64-Bit Server VM > >> > > >> > > >> > 11.0.9+7-LTS > >> > > >> > > >> > Thanks, > >> > Vladimir K > >> > > >> > On 2/15/21 5:19 AM, Nicolas Heutte wrote: > >> > > Hi Vladimir, > >> > > > >> > > I've tried disabling tiered compilation, as you > requested. It seems that the inlining was performed > >> slightly > >> > > differently, but the issue remains. As you can see in > this excerpt, the main loop isn't properly > >> vectorized: > >> > > > >> > > 0x00000254b0d4bf54: cmp %r11d,%r8d > >> > > 0x00000254b0d4bf57: jae 0x00000254b0d4c19e > >> > > 0x00000254b0d4bf5d: vmovss 0x10(%rcx,%r8,4),%xmm9 > ;*faload {reexecute=0 rethrow=0 return_oop=0} > >> > > ; - > >> > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 > (line 41) > >> > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >> > > ; - > com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 > >> (line 103) > >> > > ; - > >> > > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 > (line 66) > >> > > ; - > >> com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > >> > (line 118) > >> > > > >> > > 0x00000254b0d4bf64: cmp %ebx,%r8d > >> > > 0x00000254b0d4bf67: jae 0x00000254b0d4c1ec > >> > > 0x00000254b0d4bf6d: vaddss 0x10(%rdi,%r8,4),%xmm9,%xmm9 > >> > > 0x00000254b0d4bf74: vmovss %xmm9,0x10(%rcx,%r8,4) > ;*fastore {reexecute=0 rethrow=0 return_oop=0} > >> > > ; - > >> > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > (line 41) > >> > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >> > > ; - > com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 > >> (line 103) > >> > > ; - > >> > > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 > (line 66) > >> > > ; - > >> com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > >> > (line 118) > >> > > > >> > > 0x00000254b0d4bf7b: inc %r8d ;*iinc > {reexecute=0 rethrow=0 return_oop=0} > >> > > ; - > >> > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > (line 40) > >> > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >> > > ; - > com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 > >> (line 103) > >> > > ; - > >> > > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 > (line 66) > >> > > ; - > >> com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > >> > (line 118) > >> > > > >> > > 0x00000254b0d4bf7e: cmp %r9d,%r8d > >> > > 0x00000254b0d4bf81: jl 0x00000254b0d4bf54 ;*goto > {reexecute=0 rethrow=0 return_oop=0} > >> > > ; - > >> > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > (line 40) > >> > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > >> > > ; - > com.qfs.agg.impl.SumVectorAggregationBinding::plus at 2 > >> (line 103) > >> > > ; - > >> > > > com.qfs.agg.impl.SumVectorAggregationBinding::safeVectorAggregate at 70 > (line 66) > >> > > ; - > >> com.qfs.agg.impl.AVectorAggregationBinding::safeAggregate at 27 > >> > (line 118) > >> > > > >> > > > >> > > > >> > > Here is the link to the full log, should you want to take > a look at it: > >> > > > https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing > >> > >> < > https://urldefense.com/v3/__https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing__;!!GqivPVa7Brio!IeK3TGC8Qf5eu0KH3LEkt26TEAWCirLIkTuJ2iAAmTkfBK4_Vnnr6gkOuOydizGvKpVqaQ$> > > >> > >> > > >> < > https://urldefense.com/v3/__https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing__;!!GqivPVa7Brio!PBl1ZdyC5xtmVS0QG3dxZxEen0D1LP-mBM0KnvmRVbQXpL_VPOQ9OD-pVGBqNvuMpg6inQ$ > >> < > https://urldefense.com/v3/__https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing__;!!GqivPVa7Brio!PBl1ZdyC5xtmVS0QG3dxZxEen0D1LP-mBM0KnvmRVbQXpL_VPOQ9OD-pVGBqNvuMpg6inQ$>> > > >> > >> > > >> > > > >> > > >> < > https://urldefense.com/v3/__https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing__;!!GqivPVa7Brio!PBuP6MfDNWUOTe23SSXA0V5wn_VHjv2sjI8POWRwp6mr0wVdIzFhNoVZANb4FqCYKwzapw$ > >> < > https://urldefense.com/v3/__https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing__;!!GqivPVa7Brio!PBuP6MfDNWUOTe23SSXA0V5wn_VHjv2sjI8POWRwp6mr0wVdIzFhNoVZANb4FqCYKwzapw$> > > >> > >> > > >> < > https://urldefense.com/v3/__https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing__;!!GqivPVa7Brio!PBuP6MfDNWUOTe23SSXA0V5wn_VHjv2sjI8POWRwp6mr0wVdIzFhNoVZANb4FqCYKwzapw$ > >> < > https://urldefense.com/v3/__https://drive.google.com/file/d/1KQU7wI8NjeElFv6RrQmUsUPRMnAefzhb/view?usp=sharing__;!!GqivPVa7Brio!PBuP6MfDNWUOTe23SSXA0V5wn_VHjv2sjI8POWRwp6mr0wVdIzFhNoVZANb4FqCYKwzapw$>>> > > >> > >> > > > >> > > Best regards, > >> > > Nicolas Heutte > >> > > > >> > > On Thu, Feb 11, 2021 at 7:05 PM Vladimir Kozlov < > vladimir.kozlov at oracle.com > >> vladimir.kozlov at oracle.com > > >> > vladimir.kozlov at oracle.com> >> >>> wrote: > >> > > > >> > > Changing wide mailing list to JIT compiler only. > >> > > > >> > > This deoptimization is normal in Tiered Compilation - > it switched from profiling code (level='3') > >> generated by C1 > >> > > compiler to new code generated by C2 (level='4') > which does loop optimizations. > >> > > > >> > > Thank you for posting inlining information: > >> > > > >> > > @ 17 > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 > bytes) inline > >> (hot) > >> > > \-> TypeProfile (14054/14054 counts) = > >> com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > >> > > > >> > > I thought before that may be call site is not hot but > it is not the case. > >> > > > >> > > You can do an other experiment to collect log with > disabled Tiered Compilation (only C2 is used): > >> > -XX:-TieredCompilation > >> > > Also print assembler code (as you did before) for > final compilation to see if loop is still not > >> vectorized. > >> > > > >> > > Is it possible to post log file (on GitHub?) for me > to look? > >> > > > >> > > Thanks, > >> > > Vladimir K > >> > > > >> > > On 2/11/21 6:28 AM, Nicolas Heutte wrote: > >> > > > Hi Vladimir, > >> > > > > >> > > > Thank you for your help. > >> > > > > >> > > > I'm currently running Java 11.0.9, and I did not > use any VM flag of note. > >> > > > > >> > > > I checked the content of the compilation log, and > it seems that > >> > ArrayFloatToArrayFloatVectorBinding::plus() was > >> > > > deoptimized in order to allow AVector::plus() to > be compiled: > >> > > > > >> > > > > >> > > > method='com.qfs.vector.impl.AVector plus > >> (Lcom/qfs/vector/IVector;)V' > >> > bytes='23' > >> > > > count='916' iicount='916' level='3' > stamp='7394.056' comment='tiered' hot_count='896'/> > >> > > > > >> > > > pc='0x00000296d0785b94' compile_id='17257' > >> compiler='c1' > >> > level='3'> > >> > > > method='com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding plus > >> > > > > (Lcom/qfs/vector/IVector;Lcom/qfs/vector/IVector;)V' bytes='69' count='909' > backedge_count='155602' > >> > iicount='910'/> > >> > > > > >> > > > > >> > > > The last compilation entry for AVector::plus() is: > >> > > > > >> > > > > >> > > > level='4' entry='0x00000296d6af32c0' size='1960' > >> > > address='0x00000296d6af3110' > >> > > > relocation_offset='376' insts_offset='432' > stub_offset='1040' scopes_data_offset='1152' > >> > scopes_pcs_offset='1592' > >> > > > dependencies_offset='1880' > nul_chk_table_offset='1896' oops_offset='1064' metadata_offset='1072' > >> > > > method='com.qfs.vector.impl.AVector plus > (Lcom/qfs/vector/IVector;)V' bytes='23' count='172425' > >> > iicount='172425' > >> > > > stamp='7394.199'/> > >> > > > compile_id='17280' compiler='c1' level='2' stamp='7394.199'/> > >> > > > @ 1 > com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 > >> bytes) > >> > inline > >> > > (hot) > >> > > > \-> TypeProfile > (14552/14552 counts) = > >> > com/qfs/vector/array/impl/ArrayFloatVector > >> > > > @ 7 > com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 > >> bytes) > >> > inline > >> > > (hot) > >> > > > \-> TypeProfile > (14150/14150 counts) = > >> > com/qfs/vector/array/impl/ArrayFloatVector > >> > > > @ 10 > com.qfs.vector.binding.impl.VectorBindings::getBinding (9 > >> bytes) > >> > inline (hot) > >> > > > @ 5 > >> > > > com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::getBinding > (22 > >> > > > bytes) inline (hot) > >> > > > @ 3 > >> > > > com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::hasBinding > >> > > > (34 bytes) inline (hot) > >> > > > @ 17 > >> > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 > >> > > bytes) > >> > > > inline (hot) > >> > > > \-> TypeProfile > (14054/14054 counts) = > >> > > > > com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding > >> > > > @ 12 > com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) > >> > inline (hot) > >> > > > @ 22 > com.qfs.vector.impl.AVector::checkIndex (37 bytes) inline > >> (hot) > >> > > > @ 6 > com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) > >> > inline (hot) > >> > > > @ 27 > com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying > >> (5 bytes) > >> > > accessor > >> > > > @ 34 > com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying > >> (5 bytes) > >> > > accessor > >> > > > > >> > > > > >> > > > Unfortunately, I do not have access to a debug VM > build, so I cannot run the second test you > >> recommend. > >> > > > > >> > > > Best regards, > >> > > > Nicolas Heutte > >> > > > > >> > > > On Wed, Feb 10, 2021 at 7:36 PM Vladimir Kozlov < > vladimir.kozlov at oracle.com > >> > >> > vladimir.kozlov at oracle.com>> >> vladimir.kozlov at oracle.com >> > >> > > vladimir.kozlov at oracle.com> > >> vladimir.kozlov at oracle.com>> >> > >> > vladimir.kozlov at oracle.com>>>>> wrote: > >> > > > > >> > > > Hi, Nicolas > >> > > > > >> > > > Looks like, when inlined, the loop from > ArrayFloatToArrayFloatVectorBinding::plus() was not > >> optimized > >> > at all: > >> > > it is not > >> > > > unrolled and has range checks. Such loops are > not vectorized (you need unrolling and no checks). > >> > > > > >> > > > What Java version you are running? What > HotSpot VM flags you are using when running application? > >> > > > > >> > > > Run application with -XX:+LogCompilation and > look on compilation data in hotspot_pid.log > >> file for > >> > caller > >> > > > AVector::plus(). > >> > > > > >> > > > VM also has several flags to trace loop > optimizations but they are only available in debug VM > >> build. > >> > If you > >> > > have access > >> > > > to such build run with -XX:+PrintCompilation > -XX:+TraceLoopOpts flags. > >> > > > > >> > > > Thanks, > >> > > > Vladimir K > >> > > > > >> > > > On 2/10/21 9:24 AM, Nicolas Heutte wrote: > >> > > > > Hi all, > >> > > > > > >> > > > > I am encountering a performance issue > caused by the interaction between > >> > > > > method inlining and automatic vectorization. > >> > > > > > >> > > > > Our application aggregates arrays > intensively using a method named > >> > > > > ArrayFloatToArrayFloatVectorBinding.plus() > with the following code: > >> > > > > > >> > > > > for (int i = 0; i < srcLen; ++i) { > >> > > > > > >> > > > > dstArray[i] += srcArray[i]; > >> > > > > > >> > > > > } > >> > > > > > >> > > > > When we microbenchmark this method we > observe fast performance close to the > >> > > > > practical memory bandwidth and when we > print the assembly code we observe > >> > > > > loop unrolling and automatic vectorization > with SIMD instructions. > >> > > > > > >> > > > > 0x000001ef4600abf0: vmovdqu > 0x10(%r14,%r13,4),%ymm0 > >> > > > > > >> > > > > 0x000001ef4600abf7: vaddps > 0x10(%rcx,%r13,4),%ymm0,%ymm0 > >> > > > > > >> > > > > 0x000001ef4600abfe: vmovdqu > %ymm0,0x10(%r14,%r13,4) > >> > > > > > >> > > > > 0x000001ef4600ac05: movslq %r13d,%r11 > >> > > > > > >> > > > > 0x000001ef4600ac08: vmovdqu > 0x30(%r14,%r11,4),%ymm0 > >> > > > > > >> > > > > 0x000001ef4600ac0f: vaddps > 0x30(%rcx,%r11,4),%ymm0,%ymm0 > >> > > > > > >> > > > > 0x000001ef4600ac16: vmovdqu > %ymm0,0x30(%r14,%r11,4) > >> > > > > > >> > > > > 0x000001ef4600ac1d: vmovdqu > 0x50(%r14,%r11,4),%ymm0 > >> > > > > > >> > > > > 0x000001ef4600ac24: vaddps > 0x50(%rcx,%r11,4),%ymm0,%ymm0 > >> > > > > > >> > > > > 0x000001ef4600ac2b: vmovdqu > %ymm0,0x50(%r14,%r11,4) > >> > > > > > >> > > > > 0x000001ef4600ac32: vmovdqu > 0x70(%r14,%r11,4),%ymm0 > >> > > > > > >> > > > > 0x000001ef4600ac39: vaddps > 0x70(%rcx,%r11,4),%ymm0,%ymm0 > >> > > > > > >> > > > > 0x000001ef4600ac40: vmovdqu > %ymm0,0x70(%r14,%r11,4) > >> > > > > > >> > > > > 0x000001ef4600ac47: vmovdqu > 0x90(%r14,%r11,4),%ymm0 > >> > > > > > >> > > > > 0x000001ef4600ac51: vaddps > 0x90(%rcx,%r11,4),%ymm0,%ymm0 > >> > > > > > >> > > > > 0x000001ef4600ac5b: vmovdqu > %ymm0,0x90(%r14,%r11,4) > >> > > > > > >> > > > > 0x000001ef4600ac65: vmovdqu > 0xb0(%r14,%r11,4),%ymm0 > >> > > > > > >> > > > > 0x000001ef4600ac6f: vaddps > 0xb0(%rcx,%r11,4),%ymm0,%ymm0 > >> > > > > > >> > > > > 0x000001ef4600ac79: vmovdqu > %ymm0,0xb0(%r14,%r11,4) > >> > > > > > >> > > > > 0x000001ef4600ac83: vmovdqu > 0xd0(%r14,%r11,4),%ymm0 > >> > > > > > >> > > > > 0x000001ef4600ac8d: vaddps > 0xd0(%rcx,%r11,4),%ymm0,%ymm0 > >> > > > > > >> > > > > 0x000001ef4600ac97: vmovdqu > %ymm0,0xd0(%r14,%r11,4) > >> > > > > > >> > > > > 0x000001ef4600aca1: vmovdqu > 0xf0(%r14,%r11,4),%ymm0 > >> > > > > > >> > > > > 0x000001ef4600acab: vaddps > 0xf0(%rcx,%r11,4),%ymm0,%ymm0 > >> > > > > > >> > > > > 0x000001ef4600acb5: vmovdqu > %ymm0,0xf0(%r14,%r11,4) ;*fastore > >> > > > > {reexecute=0 rethrow=0 return_oop=0} > >> > > > > > >> > > > > > ; - > >> > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > >> > > > > (line 41) > >> > > > > > >> > > > > 0x000001ef4600acbf: add $0x40,%r13d > ;*iinc {reexecute=0 > >> > > > > rethrow=0 return_oop=0} > >> > > > > > >> > > > > > ; - > >> > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > >> > > > > (line 40) > >> > > > > > >> > > > > 0x000001ef4600acc3: cmp %eax,%r13d > >> > > > > > >> > > > > 0x000001ef4600acc6: jl > 0x000001ef4600abf0 ;*goto {reexecute=0 > >> > > > > rethrow=0 return_oop=0} > >> > > > > > >> > > > > > ; - > >> > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > >> > > > > (line 40) > >> > > > > > >> > > > > > >> > > > > > >> > > > > In the real application, this method is > actually inlined in a higher level > >> > > > > method named AVector.plus(). Unfortunately, > the inlined version of the > >> > > > > aggregation code is not vectorized anymore: > >> > > > > > >> > > > > > >> > > > > > >> > > > > 0x000001ef460180a0: cmp %ebx,%r11d > >> > > > > > >> > > > > 0x000001ef460180a3: jae > 0x000001ef460180e6 > >> > > > > > >> > > > > 0x000001ef460180a5: vmovss > 0x10(%r8,%r11,4),%xmm1 ;*faload {reexecute=0 > >> > > > > rethrow=0 return_oop=0} > >> > > > > > >> > > > > > ; - > >> > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 > >> > > > > (line 41) > >> > > > > > >> > > > > > ; - > >> > > > > com.qfs.vector.impl.AVector::plus at 17 (line > 204) > >> > > > > > >> > > > > 0x000001ef460180ac: cmp %ecx,%r11d > >> > > > > > >> > > > > 0x000001ef460180af: jae > 0x000001ef46018104 > >> > > > > > >> > > > > 0x000001ef460180b1: vaddss > 0x10(%r9,%r11,4),%xmm1,%xmm1 > >> > > > > > >> > > > > 0x000001ef460180b8: vmovss > %xmm1,0x10(%r8,%r11,4) ;*fastore {reexecute=0 > >> > > > > rethrow=0 return_oop=0} > >> > > > > > >> > > > > > ; - > >> > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > >> > > > > (line 41) > >> > > > > > >> > > > > > ; - > >> > > > > com.qfs.vector.impl.AVector::plus at 17 (line > 204) > >> > > > > > >> > > > > 0x000001ef460180bf: inc %r11d > ;*iinc {reexecute=0 > >> > > > > rethrow=0 return_oop=0} > >> > > > > > >> > > > > > ; - > >> > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > >> > > > > (line 40) > >> > > > > > >> > > > > > ; - > >> > > > > com.qfs.vector.impl.AVector::plus at 17 (line > 204) > >> > > > > > >> > > > > 0x000001ef460180c2: cmp %r10d,%r11d > >> > > > > > >> > > > > 0x000001ef460180c5: jl > 0x000001ef460180a0 ;*goto {reexecute=0 > >> > > > > rethrow=0 return_oop=0} > >> > > > > > >> > > > > > ; - > >> > > > > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > >> > > > > (line 40) > >> > > > > > >> > > > > > ; - > >> > > > > com.qfs.vector.impl.AVector::plus at 17 (line > 204) > >> > > > > > >> > > > > > >> > > > > > >> > > > > This causes a significant performance drop, > compared to a run where we > >> > > > > explicitly disable the inlining and observe > automatically vectorized code > >> > > > > again ( > >> > > > > > >> > -XX:CompileCommand=dontinline,com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding.plus > >> > > > > ). > >> > > > > > >> > > > > > >> > > > > How do you guys explain that behavior of > the JIT compiler? Is this a known > >> > > > > and tracked issue, could it be fixed in the > JVM? Can we do something in the > >> > > > > java code to prevent this from happening? > >> > > > > > >> > > > > > >> > > > > Best regards, > >> > > > > > >> > > > > Nicolas Heutte > >> > > > > > >> > > > > >> > > > >> > > >> > From iveresov at openjdk.java.net Mon Feb 22 18:03:39 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Mon, 22 Feb 2021 18:03:39 GMT Subject: RFR: 8262097: Improve CompilerConfig ergonomics to fix a VM crash after JDK-8261229 In-Reply-To: References: Message-ID: On Sun, 21 Feb 2021 23:25:26 GMT, Jie Fu wrote: > Hi all, > > This bug was found while I was verifying the fix for JDK-8262096. > It was exposed after JDK-8261229. > > Testing: > - tier1~3 on Linux/x64, no regression > > Thanks. > Best regards, > Jie I've indirectly fixed this assert that you noticed in JDK-8261225. But I agree, we should apply the client ergonomic settings as well. Looks good. ------------- Marked as reviewed by iveresov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2668 From kvn at openjdk.java.net Mon Feb 22 19:53:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 22 Feb 2021 19:53:45 GMT Subject: RFR: 8262097: Improve CompilerConfig ergonomics to fix a VM crash after JDK-8261229 In-Reply-To: References: Message-ID: On Sun, 21 Feb 2021 23:25:26 GMT, Jie Fu wrote: > Hi all, > > This bug was found while I was verifying the fix for JDK-8262096. > It was exposed after JDK-8261229. > > Testing: > - tier1~3 on Linux/x64, no regression > > Thanks. > Best regards, > Jie Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2668 From jiefu at openjdk.java.net Mon Feb 22 23:46:38 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 22 Feb 2021 23:46:38 GMT Subject: RFR: 8262097: Improve CompilerConfig ergonomics to fix a VM crash after JDK-8261229 In-Reply-To: References: Message-ID: <7zjJuLQzx9cH0bZJ-mqSx8t46AEsped-fjWroLfSB74=.ef0ad3c9-d1de-4fa2-b13d-f3bc4a5fdd50@github.com> On Mon, 22 Feb 2021 18:00:57 GMT, Igor Veresov wrote: >> Hi all, >> >> This bug was found while I was verifying the fix for JDK-8262096. >> It was exposed after JDK-8261229. >> >> Testing: >> - tier1~3 on Linux/x64, no regression >> >> Thanks. >> Best regards, >> Jie > > I've indirectly fixed this assert that you noticed in JDK-8261225. But I agree, we should apply the client ergonomic settings as well. Looks good. Thanks @veresov and @vnkozlov for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2668 From jiefu at openjdk.java.net Mon Feb 22 23:46:39 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 22 Feb 2021 23:46:39 GMT Subject: Integrated: 8262097: Improve CompilerConfig ergonomics to fix a VM crash after JDK-8261229 In-Reply-To: References: Message-ID: On Sun, 21 Feb 2021 23:25:26 GMT, Jie Fu wrote: > Hi all, > > This bug was found while I was verifying the fix for JDK-8262096. > It was exposed after JDK-8261229. > > Testing: > - tier1~3 on Linux/x64, no regression > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: f2bde05e Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/f2bde05e Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8262097: Improve CompilerConfig ergonomics to fix a VM crash after JDK-8261229 Reviewed-by: iveresov, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/2668 From whuang at openjdk.java.net Tue Feb 23 07:44:58 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Tue, 23 Feb 2021 07:44:58 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v5] In-Reply-To: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: <39-V5pqzlgXaCnL3KTIBXvHwrt35rn4WuuknDv8dcuU=.178fe7a3-4b2b-415c-aaa5-ad63598daeb3@github.com> > JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: > @Benchmark > public void testMethod(Blackhole bh) { > int sum = 0; > for (int i = 0; i < data.length; i++) { > Integer ii = Integer.valueOf(data[i]); > if (i < data.length) { > sum += ii.intValue(); > } > } > bh.consume(sum); > } > Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. > The uncommon_trap is generated by the optimized "if", because its condition is always true. > > We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, > and deleting the use of box: > > There is no additional fail/error(s) of jtreg after this patch. Wang Huang has updated the pull request incrementally with one additional commit since the last revision: add debuginfo optimization ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2401/files - new: https://git.openjdk.java.net/jdk/pull/2401/files/e80e4959..84290aeb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2401&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2401&range=03-04 Stats: 65 lines in 6 files changed: 15 ins; 6 del; 44 mod Patch: https://git.openjdk.java.net/jdk/pull/2401.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2401/head:pull/2401 PR: https://git.openjdk.java.net/jdk/pull/2401 From roland at openjdk.java.net Tue Feb 23 08:05:39 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 23 Feb 2021 08:05:39 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v3] In-Reply-To: <36Vj9yOC0pYOLdCa9ggd9Xg0WxY-O2psZMb86qwgqRI=.c158e853-6184-4790-af14-dc9dba2278cb@github.com> References: <36Vj9yOC0pYOLdCa9ggd9Xg0WxY-O2psZMb86qwgqRI=.c158e853-6184-4790-af14-dc9dba2278cb@github.com> Message-ID: On Mon, 22 Feb 2021 10:30:36 GMT, Eric Liu wrote: > As this match rule is trivial enough, how about withdrawing these shared code in this PR for integrating backend first if your helper methods coming soon? https://github.com/openjdk/jdk/pull/2045 is the one I'm referring to. No idea when it's going to get in so I would suggest to move forward with the current change and revisit it later. ------------- PR: https://git.openjdk.java.net/jdk/pull/1858 From roland at openjdk.java.net Tue Feb 23 08:09:38 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 23 Feb 2021 08:09:38 GMT Subject: RFR: 8260637: Shenandoah: assert(_base == Tuple) failure during C2 compilation In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 08:34:04 GMT, Roland Westrelin wrote: > Another shenandoah bug with a fix in shared code. > > LRBRightAfterMemBar.test2() has 2 allocations that are non escaping > but non scalarizable. As a result, the null check for a3.f is > optimized out but the CastPP is left in the graph. That CastPP becomes > control dependent on the o2 == null check which is later hoisted out > of the loop. The CastPP is then right after the membar of the barrier = 0x42 > volatile access but with an out of loop control. Because the node is > considered pinned by loopopts, it is assigned the membar as > control. The input of the CastPP is a shenandoah barrier that's > sandwiched between the membar and the CastPP and so expanded right > after the membar (that is between the membar and its control > projection). That causes the crash. I don't think cast nodes need to > be pinned so I propose that as a fix. Anyone for this (simple) change to shared c2 code? ------------- PR: https://git.openjdk.java.net/jdk/pull/2400 From roland at openjdk.java.net Tue Feb 23 08:09:50 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 23 Feb 2021 08:09:50 GMT Subject: RFR: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed Message-ID: The inner counted loop of the test case starts at 1 and stops at 1 so runs for one iteration. A counted loop is created for it. The iv Phi is found to be the constant 1 and its type is set by: l->phi()->as_Phi()->set_type(l->phi()->Value(&_igvn)); in PhaseIdealLoop::is_counted_loop() but it's not replaced by the constant 1 yet so the counted loop's shape is preserved. IdealLoopTree::do_one_iteration_loop() runs but doesn't optimize the loop because the trip count is not set to 1. The loop contains a range check and range check elimination is applied. That causes the loop exit test to be adjusted with a MinI(..) expression. When IGVN runs next, the phi is replaced with 1 but because the exit test was changed, IGVN can't prove it always fails. So the loop is not removed which causes the assert failure as loop opts progress. The fix I propose is for IdealLoopTree::do_one_iteration_loop() to remove the 1 iteration loop. The reason it doesn't happen is that IdealLoopTree::compute_trip_count() doesn't set the trip count because it finds a zero trip count: limit - init = 1 - 1 = 0. All loops, once entered execute at least once. So I think, it's safe to set the trip count to 1 in those cases. ------------- Commit messages: - whitespaces - test - fix Changes: https://git.openjdk.java.net/jdk/pull/2529/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2529&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261308 Stats: 66 lines in 2 files changed: 65 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2529.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2529/head:pull/2529 PR: https://git.openjdk.java.net/jdk/pull/2529 From github.com+10482586+therealeliu at openjdk.java.net Tue Feb 23 08:24:58 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Tue, 23 Feb 2021 08:24:58 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v4] In-Reply-To: References: Message-ID: > This patch transforms '(x >>> rshift) + (x << lshift)' into > 'RotateRight(x, rshift)' during GVN phase when both the shift exponents > are constants and their sum equals to the number of bits for the type > of shift base. > > This patch implements some new match rules for AArch64 instructions > which can take ROR as the optional shift. Such instructions are 'and', > 'or', 'eor', 'eon', 'bic' and 'orn'. > > ror w11, w2, #5 > eor w0, w1, w11 > > With this patch, above code could be optimized to below: > > eor w0, w1, w2, ror #5 > > Finally, the patch refactors TestRotate.java[1][2]. > > Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, > jdk::jdk_core, langtools::tier1. > > [1] https://bugs.openjdk.java.net/browse/JDK-8252776 > [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039911.html Eric Liu has updated the pull request incrementally with one additional commit since the last revision: Add null check Change-Id: I18dda4a01154bce72fd4025685fa0721263092ce ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1858/files - new: https://git.openjdk.java.net/jdk/pull/1858/files/492f4ca4..93577236 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1858&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1858&range=02-03 Stats: 22 lines in 1 file changed: 4 ins; 0 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/1858.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1858/head:pull/1858 PR: https://git.openjdk.java.net/jdk/pull/1858 From github.com+10482586+therealeliu at openjdk.java.net Tue Feb 23 08:37:39 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Tue, 23 Feb 2021 08:37:39 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v3] In-Reply-To: References: <36Vj9yOC0pYOLdCa9ggd9Xg0WxY-O2psZMb86qwgqRI=.c158e853-6184-4790-af14-dc9dba2278cb@github.com> Message-ID: On Tue, 23 Feb 2021 08:03:20 GMT, Roland Westrelin wrote: >> Hi Roland, >> >> Thanks for your feedback. >> >>> Even though existing code in that method seems to assume node's inputs can't be NULL, it's a good practice to protect against unexpected NULLs as that can happen when sub-graphs die during IGVN. So in(1)->in(1), in(1)->in(2), in(2)->in(2) need to be tested for NULL. >> >> Agree >> >>> That logic and the one for AddLNode are almost identical. So it would be good to have it in a shared method. I've been adding helper methods to make that possible but not all of that code is in yet. >> >> As this match rule is trivial enough, how about withdrawing these shared code in this PR for integrating backend first if your helper methods coming soon? >> >>> Could you file a bug to revisit this issue later and assign it to me? >> >> Okay, it's on my queue now:P >> >> >> -- Eric > >> As this match rule is trivial enough, how about withdrawing these shared code in this PR for integrating backend first if your helper methods coming soon? > > https://github.com/openjdk/jdk/pull/2045 is the one I'm referring to. No idea when it's going to get in so I would suggest to move forward with the current change and revisit it later. Got it. The patch has been updated. ------------- PR: https://git.openjdk.java.net/jdk/pull/1858 From roland at openjdk.java.net Tue Feb 23 08:48:40 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 23 Feb 2021 08:48:40 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v4] In-Reply-To: References: Message-ID: On Tue, 23 Feb 2021 08:24:58 GMT, Eric Liu wrote: >> This patch transforms '(x >>> rshift) + (x << lshift)' into >> 'RotateRight(x, rshift)' during GVN phase when both the shift exponents >> are constants and their sum equals to the number of bits for the type >> of shift base. >> >> This patch implements some new match rules for AArch64 instructions >> which can take ROR as the optional shift. Such instructions are 'and', >> 'or', 'eor', 'eon', 'bic' and 'orn'. >> >> ror w11, w2, #5 >> eor w0, w1, w11 >> >> With this patch, above code could be optimized to below: >> >> eor w0, w1, w2, ror #5 >> >> Finally, the patch refactors TestRotate.java[1][2]. >> >> Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, >> jdk::jdk_core, langtools::tier1. >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8252776 >> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039911.html > > Eric Liu has updated the pull request incrementally with one additional commit since the last revision: > > Add null check > > Change-Id: I18dda4a01154bce72fd4025685fa0721263092ce Shared code looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1858 From github.com+10482586+therealeliu at openjdk.java.net Tue Feb 23 09:12:40 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Tue, 23 Feb 2021 09:12:40 GMT Subject: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v4] In-Reply-To: References: Message-ID: On Tue, 23 Feb 2021 08:46:10 GMT, Roland Westrelin wrote: >> Eric Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Add null check >> >> Change-Id: I18dda4a01154bce72fd4025685fa0721263092ce > > Shared code looks good to me. Thanks for your review. I will integrate it after some tests were finished. ------------- PR: https://git.openjdk.java.net/jdk/pull/1858 From dongbo at openjdk.java.net Tue Feb 23 10:21:02 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 23 Feb 2021 10:21:02 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v9] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with two additional commits since the last revision: - whitespace - Rebase tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg case, assembly print verified. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/ba8dc5ac..e2dc7b83 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=07-08 Stats: 903 lines in 2 files changed: 320 ins; 387 del; 196 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Tue Feb 23 10:31:45 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 23 Feb 2021 10:31:45 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v9] In-Reply-To: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> Message-ID: On Tue, 9 Feb 2021 07:50:45 GMT, Ningsheng Jian wrote: >> Dong Bo has updated the pull request incrementally with two additional commits since the last revision: >> >> - whitespace >> - Rebase tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg case, assembly print verified. > > Thanks for the fix. Hi, @theRealAph I've rebased the tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg `test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java`. Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions? ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From jvernee at openjdk.java.net Tue Feb 23 12:24:39 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 23 Feb 2021 12:24:39 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker [v2] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 10:41:18 GMT, Andrew Dinn wrote: >> Seems reasonable. > > The code here looks ok. I'm slightly concerned about the consequences of adding a new stack frame visible to stack walking code. Does this have the potential to break serviceability code that reports and/or analyzes stack frames (whether that's code in OpenJDK or 3rd party code)? @adinn I'm not aware of any such use-cases (whether in the JDK or elsewhere). They would only be affected if they were using Panama native calls, which were introduce pretty recently, and are also still in incubator state. Inside the JDK the only place where this code is currently being used is in the jdk/java/foreign test stuite, as well as in the internal implementation of the Panama linker. If that test suite still passes I'm happy to call it safe. ------------- PR: https://git.openjdk.java.net/jdk/pull/2528 From jvernee at openjdk.java.net Tue Feb 23 13:02:42 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 23 Feb 2021 13:02:42 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker [v2] In-Reply-To: References: Message-ID: <6NJDW9Yzgj-U0tRiPS1dscN7i3E27kdKuUDdREMA8lo=.81cd0ead-af88-4af3-bcf3-6025cff8f855@github.com> On Fri, 12 Feb 2021 09:37:02 GMT, Roland Westrelin wrote: >> We spotted this issue with Shenandoah and I managed to write a simple >> test case that reproduces it reliably with Shenandoah but the issue is >> independent of the GC. >> >> The loop in the test case calls a native invoker with an oop live in >> rbp. rbp is saved in the native invoker stub's frame. A safepoint is >> triggered from the safepoint check in the native invoker. The stack >> walking code sees that rbp contains an oop but can't find where that >> oop is stored. That's because stack walking updates the caller's frame >> with the location of rbp in the callee on calls to >> frame::sender(). But the current code sets the last java frame to be >> the compiled frame where rbp is live. So there's no call to >> frame::sender() to update the location rbp. The fix I propose is that >> the frame of the native invoker be visible by stack walking. On a >> safepoint, stack walking starts from the native invoker thread, then >> calls frame::sender() to move to the compiled frame. That causes rbp >> to be properly recorded with its location in the native invoker frame. >> >> Same problem affects both x86 and aarch64. I've tested this patch with: >> >> make run-test TEST="java/foreign" TEST_VM_OPTS="-Xcomp" JTREG="TIMEOUT_FACTOR=10" >> >> on both platforms. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > broken build LGTM, but I've left some suggestions in line. Thanks for cleaning up the frame layout code. Having a fixed frame is much better than repeatedly modifying the stack pointer. I've also done some downstream stress testing with jextract, and everything works as expected. src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 3514: > 3512: > 3513: OopMapSet* oop_maps() const { > 3514: return _oop_maps; The `saved_rbp_address_offset` field was added to JavaFrameAnchor just to serve this particular use case. It should be cleaned up now that it is no longer used. If you don't want to take care of that, we can file a followup. There is code in `frame_*.cpp`, `thread_*.hpp`, and `javaFrameAnchor_*.hpp` that deals with this field (not on all platforms/archs though). See https://github.com/openjdk/jdk/pull/634/files & https://github.com/openjdk/jdk/pull/1711/files for the original diffs. test/hotspot/jtreg/gc/shenandoah/compiler/TestLinkToNativeRBP.java line 50: > 48: public class TestLinkToNativeRBP { > 49: final static CLinker abi = CLinker.getInstance(); > 50: static final LibraryLookup lookup = LibraryLookup.ofDefault(); The default library can be unreliable (but we've kept it in lieu of something better, which is still in the pipeline). We've had problems with it in the past since it acts differently in different environments, so we try to avoid it in tests. It would be more robust to add a small test library that defines a dummy function and then depend on that instead. If you've not done this before; you can just add a `lib.c` file to the same directory as the main test file and the build system will compile it for you. You can then load it with `LibraryLookup.ofLibrary("")` in the test. See the [test/jdk/java/foreign/stackwalk](https://github.com/openjdk/jdk/tree/master/test/jdk/java/foreign/stackwalk) directory for an example (one noteworthy thing is that the function has to be explicitly exported in order to work on Windows. See the example). ------------- Marked as reviewed by jvernee (Committer). PR: https://git.openjdk.java.net/jdk/pull/2528 From github.com+42899633+eastig at openjdk.java.net Tue Feb 23 16:01:40 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 23 Feb 2021 16:01:40 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v6] In-Reply-To: References: Message-ID: On Sun, 21 Feb 2021 02:07:59 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > fix build failures on Windows. StringUtils::tr_delete returns size_of. Changes requested by eastig at github.com (no known OpenJDK username). src/hotspot/share/opto/type.cpp line 4056: > 4054: StringUtils::tr_delete(buf, "\n"); > 4055: st->print_raw(buf); > 4056: os::free(buf); There is no need to use `os::strdup` because `as_string` creates a copy. I've looked in stringUtils.cpp and found `replace_no_expand`. The code can be rewritten: char *buf = ss.as_string(); StringUtils::replace_no_expand(buf, "\n", ""); st->print_raw(buf); With this code, `tr_delete` is redundant. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From chagedorn at openjdk.java.net Tue Feb 23 16:14:45 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 23 Feb 2021 16:14:45 GMT Subject: RFR: 8260637: Shenandoah: assert(_base == Tuple) failure during C2 compilation In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 08:34:04 GMT, Roland Westrelin wrote: > Another shenandoah bug with a fix in shared code. > > LRBRightAfterMemBar.test2() has 2 allocations that are non escaping > but non scalarizable. As a result, the null check for a3.f is > optimized out but the CastPP is left in the graph. That CastPP becomes > control dependent on the o2 == null check which is later hoisted out > of the loop. The CastPP is then right after the membar of the barrier = 0x42 > volatile access but with an out of loop control. Because the node is > considered pinned by loopopts, it is assigned the membar as > control. The input of the CastPP is a shenandoah barrier that's > sandwiched between the membar and the CastPP and so expanded right > after the membar (that is between the membar and its control > projection). That causes the crash. I don't think cast nodes need to > be pinned so I propose that as a fix. Looks reasonable to me! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2400 From kvn at openjdk.java.net Tue Feb 23 16:32:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 23 Feb 2021 16:32:39 GMT Subject: RFR: 8260637: Shenandoah: assert(_base == Tuple) failure during C2 compilation In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 08:34:04 GMT, Roland Westrelin wrote: > Another shenandoah bug with a fix in shared code. > > LRBRightAfterMemBar.test2() has 2 allocations that are non escaping > but non scalarizable. As a result, the null check for a3.f is > optimized out but the CastPP is left in the graph. That CastPP becomes > control dependent on the o2 == null check which is later hoisted out > of the loop. The CastPP is then right after the membar of the barrier = 0x42 > volatile access but with an out of loop control. Because the node is > considered pinned by loopopts, it is assigned the membar as > control. The input of the CastPP is a shenandoah barrier that's > sandwiched between the membar and the CastPP and so expanded right > after the membar (that is between the membar and its control > projection). That causes the crash. I don't think cast nodes need to > be pinned so I propose that as a fix. Ok ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2400 From roland at openjdk.java.net Tue Feb 23 16:37:40 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 23 Feb 2021 16:37:40 GMT Subject: RFR: 8260637: Shenandoah: assert(_base == Tuple) failure during C2 compilation In-Reply-To: References: Message-ID: On Tue, 23 Feb 2021 16:29:59 GMT, Vladimir Kozlov wrote: >> Another shenandoah bug with a fix in shared code. >> >> LRBRightAfterMemBar.test2() has 2 allocations that are non escaping >> but non scalarizable. As a result, the null check for a3.f is >> optimized out but the CastPP is left in the graph. That CastPP becomes >> control dependent on the o2 == null check which is later hoisted out >> of the loop. The CastPP is then right after the membar of the barrier = 0x42 >> volatile access but with an out of loop control. Because the node is >> considered pinned by loopopts, it is assigned the membar as >> control. The input of the CastPP is a shenandoah barrier that's >> sandwiched between the membar and the CastPP and so expanded right >> after the membar (that is between the membar and its control >> projection). That causes the crash. I don't think cast nodes need to >> be pinned so I propose that as a fix. > > Ok @vnkozlov @chhagedorn thanks for the reviews ------------- PR: https://git.openjdk.java.net/jdk/pull/2400 From roland at openjdk.java.net Tue Feb 23 16:37:41 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 23 Feb 2021 16:37:41 GMT Subject: Integrated: 8260637: Shenandoah: assert(_base == Tuple) failure during C2 compilation In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 08:34:04 GMT, Roland Westrelin wrote: > Another shenandoah bug with a fix in shared code. > > LRBRightAfterMemBar.test2() has 2 allocations that are non escaping > but non scalarizable. As a result, the null check for a3.f is > optimized out but the CastPP is left in the graph. That CastPP becomes > control dependent on the o2 == null check which is later hoisted out > of the loop. The CastPP is then right after the membar of the barrier = 0x42 > volatile access but with an out of loop control. Because the node is > considered pinned by loopopts, it is assigned the membar as > control. The input of the CastPP is a shenandoah barrier that's > sandwiched between the membar and the CastPP and so expanded right > after the membar (that is between the membar and its control > projection). That causes the crash. I don't think cast nodes need to > be pinned so I propose that as a fix. This pull request has now been integrated. Changeset: 8a2f5890 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/8a2f5890 Stats: 33 lines in 2 files changed: 27 ins; 0 del; 6 mod 8260637: Shenandoah: assert(_base == Tuple) failure during C2 compilation Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/2400 From chagedorn at openjdk.java.net Tue Feb 23 16:46:44 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 23 Feb 2021 16:46:44 GMT Subject: RFR: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed In-Reply-To: References: Message-ID: <_4YFKcN-AhcsKl3a4PPzD4Op466zqUlOkUQjWd1epC8=.6ebb4db2-1e7b-46ed-9a87-60f3b9c35e50@github.com> On Thu, 11 Feb 2021 16:11:50 GMT, Roland Westrelin wrote: > The inner counted loop of the test case starts at 1 and stops at 1 so > runs for one iteration. A counted loop is created for it. The iv Phi > is found to be the constant 1 and its type is set by: > > l->phi()->as_Phi()->set_type(l->phi()->Value(&_igvn)); > > in PhaseIdealLoop::is_counted_loop() but it's not replaced by the > constant 1 yet so the counted loop's shape is preserved. > > IdealLoopTree::do_one_iteration_loop() runs but doesn't optimize the > loop because the trip count is not set to 1. The loop contains a range > check and range check elimination is applied. That causes the loop > exit test to be adjusted with a MinI(..) expression. When IGVN runs > next, the phi is replaced with 1 but because the exit test was > changed, IGVN can't prove it always fails. So the loop is not removed > which causes the assert failure as loop opts progress. > > The fix I propose is for IdealLoopTree::do_one_iteration_loop() to > remove the 1 iteration loop. The reason it doesn't happen is that > IdealLoopTree::compute_trip_count() doesn't set the trip count because > it finds a zero trip count: limit - init = 1 - 1 = 0. All loops, once > entered execute at least once. So I think, it's safe to set the trip > count to 1 in those cases. That's reasonable to do. Looks good to me! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2529 From never at openjdk.java.net Tue Feb 23 17:04:47 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Tue, 23 Feb 2021 17:04:47 GMT Subject: RFR: 8262011: [JVMCI] allow printing to tty from unattached libgraal thread In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 10:10:17 GMT, Doug Simon wrote: > Currently, `HotSpotJVMCIRuntime.writeDebugOutput` does nothing if the current thread is not attached to HotSpot (i.e., `Thread::current_or_null() == NULL`). This means crucial debug info can be lost. For reference, an unattached libgraal thread is a thread started from within libgraal that has not yet attached itself to the VM (e.g., before [this line](https://github.com/oracle/graal/blob/e4b9ab931940e1946f96f2015b937ba100384573/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java#L42)) or has already detached itself (e.g., after [this line](https://github.com/oracle/graal/blob/e4b9ab931940e1946f96f2015b937ba100384573/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java#L46)). > > The reason for the current behavior is that `HotSpotJVMCIRuntime.writeDebugOutput` passes a Java byte array to C++ code and the C++ code calls back into Java to decode the byte array into a native buffer. These call backs require the current thread to be attached to the VM. > > This PR moves the Java-to-native-buffer decoding into Java and thus avoids the requirement for the current thread to be attached to the VM. > > Tested in libgraal by patching Graal as follows: > diff --git a/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java b/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java > index 36064767c95..352395dd59b 100644 > --- a/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java > +++ b/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java > @@ -43,7 +43,14 @@ public class GraalServiceThread extends Thread { > try { > runnable.run(); > } finally { > + String debug = System.getenv("GraalServiceThread.debug"); > afterRun(); > + if ("true".equals(debug)) { > + throw new InternalError("THROWN AFTER DETACHING"); > + } > } > } > > Running without the changes in this PR: >> env GraalServiceThread.debug=true java -jar dacapo.jar avrora > ===== DaCapo 9.12 avrora starting ===== > ===== DaCapo 9.12 avrora PASSED in 4270 msec ===== > > Running with the changes in this PR: >> env GraalServiceThread.debug=true java -jar dacapo.jar avrora > ===== DaCapo 9.12 avrora starting ===== > Exception in thread "LibGraalHotSpotGraalManagement-init" java.lang.InternalError: THROWN AFTER DETACHING > at org.graalvm.compiler.core.GraalServiceThread.run(GraalServiceThread.java:52) > at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:519) > at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:192) > ===== DaCapo 9.12 avrora PASSED in 4688 msec ===== Looks good. ------------- Marked as reviewed by never (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2640 From dnsimon at openjdk.java.net Tue Feb 23 17:04:47 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 23 Feb 2021 17:04:47 GMT Subject: Integrated: 8262011: [JVMCI] allow printing to tty from unattached libgraal thread In-Reply-To: References: Message-ID: <5HnNCDmwGXnINtnvJUoPX4tPInEecrCwzEietDkKrl0=.7dbce5a9-779c-4712-b739-47a65b69f3c8@github.com> On Fri, 19 Feb 2021 10:10:17 GMT, Doug Simon wrote: > Currently, `HotSpotJVMCIRuntime.writeDebugOutput` does nothing if the current thread is not attached to HotSpot (i.e., `Thread::current_or_null() == NULL`). This means crucial debug info can be lost. For reference, an unattached libgraal thread is a thread started from within libgraal that has not yet attached itself to the VM (e.g., before [this line](https://github.com/oracle/graal/blob/e4b9ab931940e1946f96f2015b937ba100384573/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java#L42)) or has already detached itself (e.g., after [this line](https://github.com/oracle/graal/blob/e4b9ab931940e1946f96f2015b937ba100384573/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java#L46)). > > The reason for the current behavior is that `HotSpotJVMCIRuntime.writeDebugOutput` passes a Java byte array to C++ code and the C++ code calls back into Java to decode the byte array into a native buffer. These call backs require the current thread to be attached to the VM. > > This PR moves the Java-to-native-buffer decoding into Java and thus avoids the requirement for the current thread to be attached to the VM. > > Tested in libgraal by patching Graal as follows: > diff --git a/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java b/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java > index 36064767c95..352395dd59b 100644 > --- a/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java > +++ b/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java > @@ -43,7 +43,14 @@ public class GraalServiceThread extends Thread { > try { > runnable.run(); > } finally { > + String debug = System.getenv("GraalServiceThread.debug"); > afterRun(); > + if ("true".equals(debug)) { > + throw new InternalError("THROWN AFTER DETACHING"); > + } > } > } > > Running without the changes in this PR: >> env GraalServiceThread.debug=true java -jar dacapo.jar avrora > ===== DaCapo 9.12 avrora starting ===== > ===== DaCapo 9.12 avrora PASSED in 4270 msec ===== > > Running with the changes in this PR: >> env GraalServiceThread.debug=true java -jar dacapo.jar avrora > ===== DaCapo 9.12 avrora starting ===== > Exception in thread "LibGraalHotSpotGraalManagement-init" java.lang.InternalError: THROWN AFTER DETACHING > at org.graalvm.compiler.core.GraalServiceThread.run(GraalServiceThread.java:52) > at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:519) > at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:192) > ===== DaCapo 9.12 avrora PASSED in 4688 msec ===== This pull request has now been integrated. Changeset: d2b9c227 Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/d2b9c227 Stats: 322 lines in 8 files changed: 49 ins; 246 del; 27 mod 8262011: [JVMCI] allow printing to tty from unattached libgraal thread Reviewed-by: kvn, never ------------- PR: https://git.openjdk.java.net/jdk/pull/2640 From aph at openjdk.java.net Tue Feb 23 17:16:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 23 Feb 2021 17:16:41 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v9] In-Reply-To: References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> Message-ID: On Tue, 23 Feb 2021 10:29:05 GMT, Dong Bo wrote: > Hi, @theRealAph > > I've rebased the tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg `test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java`. > Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions? I'm not seeing ```sra``` used anywhere. The problem I see with the tests is that the methods are large. This causes C2 to do a lot of spilling. Also, because the resuling code is intertwined and complex, it's very hard to debug. It would be far better to do something like this: void long_shift_add(long arrLongs[][], LongVector vba, LongVector vbb, int i) { vba.add(vbb.lanewise(VectorOperators.LSHR, 37)).intoArray(arrLongs[op], 2 * i); vba.add(vbb.lanewise(VectorOperators.LSHR, 64)).intoArray(arrLongs[op + 1], 2 * i); vba.add(vbb.lanewise(VectorOperators.LSHR, 99)).intoArray(arrLongs[op + 2], 2 * i); vba.add(vbb.lanewise(VectorOperators.LSHR, 128)).intoArray(arrLongs[op + 3], 2 * i); vba.add(vbb.lanewise(VectorOperators.LSHR, 157)).intoArray(arrLongs[op + 4], 2 * i); vba.add(vbb.lanewise(VectorOperators.LSHR, 192)).intoArray(arrLongs[op + 5], 2 * i); } ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From kvn at openjdk.java.net Tue Feb 23 17:19:44 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 23 Feb 2021 17:19:44 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v5] In-Reply-To: <39-V5pqzlgXaCnL3KTIBXvHwrt35rn4WuuknDv8dcuU=.178fe7a3-4b2b-415c-aaa5-ad63598daeb3@github.com> References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> <39-V5pqzlgXaCnL3KTIBXvHwrt35rn4WuuknDv8dcuU=.178fe7a3-4b2b-415c-aaa5-ad63598daeb3@github.com> Message-ID: On Tue, 23 Feb 2021 07:44:58 GMT, Wang Huang wrote: >> JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: >> @Benchmark >> public void testMethod(Blackhole bh) { >> int sum = 0; >> for (int i = 0; i < data.length; i++) { >> Integer ii = Integer.valueOf(data[i]); >> if (i < data.length) { >> sum += ii.intValue(); >> } >> } >> bh.consume(sum); >> } >> Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. >> The uncommon_trap is generated by the optimized "if", because its condition is always true. >> >> We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, >> and deleting the use of box: >> >> There is no additional fail/error(s) of jtreg after this patch. >> >> I adjust my codes and add a new benchmark >> >> public class MyBenchmark { >> >> static int[] data = new int[10000]; >> >> static { >> for(int i = 0; i < data.length; ++i) { >> data[i] = i * 1337 % 7331; >> } >> } >> >> @Benchmark >> public void testMethod(Blackhole bh) { >> int sum = 0; >> for (int i = 0; i < data.length; i++) { >> Integer ii = Integer.valueOf(data[i]); >> black(); >> if (i < 100000) { >> sum += ii.intValue(); >> } >> } >> bh.consume(sum); >> } >> >> public void black(){} >> } >> >> >> aarch64: >> base line? >> Benchmark Mode Samples Score Score error Units >> o.s.MyBenchmark.testMethod avgt 30 88.513 1.111 us/op >> >> opt? >> Benchmark Mode Samples Score Score error Units >> o.s.MyBenchmark.testMethod avgt 30 52.776 0.096 us/op >> >> x86: >> base line? >> Benchmark Mode Samples Score Score error Units >> o.s.MyBenchmark.testMethod avgt 30 81.066 3.156 us/op >> >> opt: >> Benchmark Mode Samples Score Score error Units >> o.s.MyBenchmark.testMethod avgt 30 55.596 0.775 us/op > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > add debuginfo optimization Please, add test case which verifies that Box is scalarized by forking process and checking output of run with `-XX:+PrintEliminateAllocations` flag. You also need a test which triggers deoptimization and execute code for Box object reallocation/initialization or load from cache. A test should also verifies that box object identity matches after deoptimization. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From kvn at openjdk.java.net Tue Feb 23 19:18:42 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 23 Feb 2021 19:18:42 GMT Subject: RFR: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 16:11:50 GMT, Roland Westrelin wrote: > The inner counted loop of the test case starts at 1 and stops at 1 so > runs for one iteration. A counted loop is created for it. The iv Phi > is found to be the constant 1 and its type is set by: > > l->phi()->as_Phi()->set_type(l->phi()->Value(&_igvn)); > > in PhaseIdealLoop::is_counted_loop() but it's not replaced by the > constant 1 yet so the counted loop's shape is preserved. > > IdealLoopTree::do_one_iteration_loop() runs but doesn't optimize the > loop because the trip count is not set to 1. The loop contains a range > check and range check elimination is applied. That causes the loop > exit test to be adjusted with a MinI(..) expression. When IGVN runs > next, the phi is replaced with 1 but because the exit test was > changed, IGVN can't prove it always fails. So the loop is not removed > which causes the assert failure as loop opts progress. > > The fix I propose is for IdealLoopTree::do_one_iteration_loop() to > remove the 1 iteration loop. The reason it doesn't happen is that > IdealLoopTree::compute_trip_count() doesn't set the trip count because > it finds a zero trip count: limit - init = 1 - 1 = 0. All loops, once > entered execute at least once. So I think, it's safe to set the trip > count to 1 in those cases. I think we need a test which verifies one_iteration loop optimization with different variations of init, limit and stride values to make sure `trip_count` formula is correct. src/hotspot/share/opto/loopTransform.cpp line 126: > 124: jlong limit_con = (stride_con > 0) ? limit_type->_hi : limit_type->_lo; > 125: int stride_m = stride_con - (stride_con > 0 ? 1 : -1); > 126: jlong trip_count = (limit_con - init_con + stride_m)/stride_con; Does it mean that before we execute do_one_iteration_loop() optimization for loops which do 2 trips? This seems wrong. Can you check? I agree that loop body executed at least once since it is exit check. But it means we have to generally adjust `trip_count` with `+1` and not do what you suggested. ------------- PR: https://git.openjdk.java.net/jdk/pull/2529 From xliu at openjdk.java.net Tue Feb 23 19:32:42 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 23 Feb 2021 19:32:42 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v6] In-Reply-To: References: Message-ID: <-6o8O3AATyO7_tSm4zM4ztzGvjcQgCvFbk0NE7J9yJQ=.a452f241-9103-4408-b853-4befdb110358@github.com> On Tue, 23 Feb 2021 15:59:03 GMT, Evgeny Astigeevich wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> fix build failures on Windows. StringUtils::tr_delete returns size_of. > > src/hotspot/share/opto/type.cpp line 4056: > >> 4054: StringUtils::tr_delete(buf, "\n"); >> 4055: st->print_raw(buf); >> 4056: os::free(buf); > > There is no need to use `os::strdup` because `as_string` creates a copy. > I've looked in stringUtils.cpp and found `replace_no_expand`. The code can be rewritten: > char *buf = ss.as_string(); > StringUtils::replace_no_expand(buf, "\n", ""); > st->print_raw(buf); > With this code, `tr_delete` is redundant. oh, thanks for the head-up. I'm happy to remove os::strdup and os::free pair. it seems that replace_no_expand is cumbersome to do what tr_delete does. let me see how it works. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From github.com+42899633+eastig at openjdk.java.net Tue Feb 23 21:11:39 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 23 Feb 2021 21:11:39 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v6] In-Reply-To: <-6o8O3AATyO7_tSm4zM4ztzGvjcQgCvFbk0NE7J9yJQ=.a452f241-9103-4408-b853-4befdb110358@github.com> References: <-6o8O3AATyO7_tSm4zM4ztzGvjcQgCvFbk0NE7J9yJQ=.a452f241-9103-4408-b853-4befdb110358@github.com> Message-ID: <_bbShtIislmHUrEyims35ijjL_jtvIdI3BgwSR-ZdD0=.d88d8516-d838-4c1b-9432-578836a32a40@github.com> On Tue, 23 Feb 2021 19:30:00 GMT, Xin Liu wrote: > oh, thanks for the head-up. I'm happy to remove os::strdup and os::free pair. > it seems that replace_no_expand is cumbersome to do what tr_delete does. let me see how it works. I don't see why it is cumbersome. IMHO, it is logically consistent: replace substring with an empty string without expanding the buffer. The main value is the amount of written code. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From zgu at openjdk.java.net Tue Feb 23 21:58:52 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 23 Feb 2021 21:58:52 GMT Subject: RFR: 8262259: Remove unused variable in MethodLiveness::BasicBlock::compute_gen_kill_single Message-ID: Please review this trivial patch that removes unused/dead local variable. ------------- Commit messages: - Merge branch 'master' into JDK-8262259-rm-unused-localNum - JDK-8262259 Changes: https://git.openjdk.java.net/jdk/pull/2698/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2698&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262259 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2698.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2698/head:pull/2698 PR: https://git.openjdk.java.net/jdk/pull/2698 From xliu at openjdk.java.net Tue Feb 23 23:12:56 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 23 Feb 2021 23:12:56 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v7] In-Reply-To: References: Message-ID: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set use the existing api StringUtils::replace_no_expand to archive the same replace. don't need to invoke os::strdup because stringStream::as_string() has duplicated the internal buffer. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2178/files - new: https://git.openjdk.java.net/jdk/pull/2178/files/077f9b60..6df63fe1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=05-06 Stats: 83 lines in 5 files changed: 0 ins; 75 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2178.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2178/head:pull/2178 PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Tue Feb 23 23:18:41 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 23 Feb 2021 23:18:41 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v6] In-Reply-To: <_bbShtIislmHUrEyims35ijjL_jtvIdI3BgwSR-ZdD0=.d88d8516-d838-4c1b-9432-578836a32a40@github.com> References: <-6o8O3AATyO7_tSm4zM4ztzGvjcQgCvFbk0NE7J9yJQ=.a452f241-9103-4408-b853-4befdb110358@github.com> <_bbShtIislmHUrEyims35ijjL_jtvIdI3BgwSR-ZdD0=.d88d8516-d838-4c1b-9432-578836a32a40@github.com> Message-ID: On Tue, 23 Feb 2021 21:09:15 GMT, Evgeny Astigeevich wrote: >> oh, thanks for the head-up. I'm happy to remove os::strdup and os::free pair. >> it seems that replace_no_expand is cumbersome to do what tr_delete does. let me see how it works. > >> oh, thanks for the head-up. I'm happy to remove os::strdup and os::free pair. >> it seems that replace_no_expand is cumbersome to do what tr_delete does. let me see how it works. > > I don't see why it is cumbersome. IMHO, it is logically consistent: replace substring with an empty string without expanding the buffer. The main value is the amount of written code. oh, by means "cumbersome", I just felt that it's easier to sweeping chars than substrings in my case. but I has verified `replace_no_expand(buf, "\n", "")` has the same effect. I took you advice. less code is less chance to make mistake. Updated this PR. I also verified that it has the same results for `-XX:+Verbose -XX:+PrintIdeal`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From stuefe at openjdk.java.net Wed Feb 24 04:59:48 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 24 Feb 2021 04:59:48 GMT Subject: RFR: 8262259: Remove unused variable in MethodLiveness::BasicBlock::compute_gen_kill_single In-Reply-To: References: Message-ID: On Tue, 23 Feb 2021 21:52:31 GMT, Zhengyu Gu wrote: > Please review this trivial patch that removes unused/dead local variable. Seems fine and trivial. ..Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2698 From thartmann at openjdk.java.net Wed Feb 24 06:46:39 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 24 Feb 2021 06:46:39 GMT Subject: RFR: 8262259: Remove unused variable in MethodLiveness::BasicBlock::compute_gen_kill_single In-Reply-To: References: Message-ID: On Tue, 23 Feb 2021 21:52:31 GMT, Zhengyu Gu wrote: > Please review this trivial patch that removes unused/dead local variable. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2698 From thartmann at openjdk.java.net Wed Feb 24 06:56:41 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 24 Feb 2021 06:56:41 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v7] In-Reply-To: References: Message-ID: On Tue, 23 Feb 2021 23:12:56 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > use the existing api StringUtils::replace_no_expand to archive the same replace. > don't need to invoke os::strdup because stringStream::as_string() has duplicated > the internal buffer. Changes requested by thartmann (Reviewer). src/hotspot/share/opto/type.cpp line 4052: > 4050: > 4051: { > 4052: ResourceMark rm; Shouldn't the `ResourceMark` go to before `stringStream ss` which is a `ResourceObj` as well? Also, please add a small comment explaining that this code suppresses the new line emitted by `print_oop`. src/hotspot/share/opto/type.cpp line 4040: > 4038: // Dump oop Type > 4039: #ifndef PRODUCT > 4040: void TypeInstPtr::dump2( Dict &d, uint depth, outputStream* st ) const { While you are at it, please also remove the excess whitespace after `(` and before `)`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From dongbo at openjdk.java.net Wed Feb 24 07:27:03 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 24 Feb 2021 07:27:03 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v10] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: update tests as suggestions ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/e2dc7b83..9290f27e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=08-09 Stats: 465 lines in 1 file changed: 159 ins; 187 del; 119 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From thartmann at openjdk.java.net Wed Feb 24 07:31:47 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 24 Feb 2021 07:31:47 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v5] In-Reply-To: <39-V5pqzlgXaCnL3KTIBXvHwrt35rn4WuuknDv8dcuU=.178fe7a3-4b2b-415c-aaa5-ad63598daeb3@github.com> References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> <39-V5pqzlgXaCnL3KTIBXvHwrt35rn4WuuknDv8dcuU=.178fe7a3-4b2b-415c-aaa5-ad63598daeb3@github.com> Message-ID: On Tue, 23 Feb 2021 07:44:58 GMT, Wang Huang wrote: >> JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: >> @Benchmark >> public void testMethod(Blackhole bh) { >> int sum = 0; >> for (int i = 0; i < data.length; i++) { >> Integer ii = Integer.valueOf(data[i]); >> if (i < data.length) { >> sum += ii.intValue(); >> } >> } >> bh.consume(sum); >> } >> Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. >> The uncommon_trap is generated by the optimized "if", because its condition is always true. >> >> We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, >> and deleting the use of box: >> >> There is no additional fail/error(s) of jtreg after this patch. >> >> I adjust my codes and add a new benchmark >> >> public class MyBenchmark { >> >> static int[] data = new int[10000]; >> >> static { >> for(int i = 0; i < data.length; ++i) { >> data[i] = i * 1337 % 7331; >> } >> } >> >> @Benchmark >> public void testMethod(Blackhole bh) { >> int sum = 0; >> for (int i = 0; i < data.length; i++) { >> Integer ii = Integer.valueOf(data[i]); >> black(); >> if (i < 100000) { >> sum += ii.intValue(); >> } >> } >> bh.consume(sum); >> } >> >> public void black(){} >> } >> >> >> aarch64: >> base line? >> Benchmark Mode Samples Score Score error Units >> o.s.MyBenchmark.testMethod avgt 30 88.513 1.111 us/op >> >> opt? >> Benchmark Mode Samples Score Score error Units >> o.s.MyBenchmark.testMethod avgt 30 52.776 0.096 us/op >> >> x86: >> base line? >> Benchmark Mode Samples Score Score error Units >> o.s.MyBenchmark.testMethod avgt 30 81.066 3.156 us/op >> >> opt: >> Benchmark Mode Samples Score Score error Units >> o.s.MyBenchmark.testMethod avgt 30 55.596 0.775 us/op > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > add debuginfo optimization I agree with Vladimir that tests are needed. I've added some comments to the code. src/hotspot/share/opto/callGenerator.cpp line 561: > 559: if (resproj != nullptr && call->is_CallStaticJava() && > 560: call->as_CallStaticJava()->is_boxing_method()) { > 561: Unique_Node_List debuginfo_node_list; Maybe rename this to `safepoints`. src/hotspot/share/opto/callGenerator.cpp line 569: > 567: for (uint i = 0; i < dbg_start; i++) { > 568: if (sfpt->in(i) == resproj) { > 569: return; I think this code can be replaced by: if (!sfpt->is_Call() || !sfpt->as_Call()->has_non_debug_use(n)) { safepoints.push(sfpt); } else { ... } src/hotspot/share/opto/callGenerator.cpp line 587: > 585: ciInstanceKlass* klass = call->as_CallStaticJava()->method()->holder(); > 586: int n_fields = klass->nof_nonstatic_fields(); > 587: assert(n_fields == 1, "the klass must be an auto-boxing klass"); This code can be put in `ifdef ASSERT` and `n_fields` below can be replaced by 1. src/hotspot/share/opto/callGenerator.cpp line 656: > 654: } > 655: > 656: replace_box_to_scalar(call, callprojs.resproj); Should this be guarded by `C->eliminate_boxing()`? src/hotspot/share/opto/callnode.hpp line 503: > 501: // It is relative to the last (youngest) jvms->_scloff. > 502: uint _n_fields; // Number of non-static fields of the scalarized object. > 503: bool _is_auto_box; // is the scalarized object is auto box. Typo in comment. Should be something like `// True if the scalarized object is an auto box` src/hotspot/share/opto/callGenerator.cpp line 583: > 581: while (debuginfo_node_list.size() > 0) { > 582: ProjNode* res = resproj->as_Proj(); > 583: Node* debuginfo_node = debuginfo_node_list.pop(); `debuginfo_node` -> `safepoint` src/hotspot/share/opto/callGenerator.cpp line 596: > 594: first_ind, n_fields, true); > 595: sobj->init_req(0, kit.root()); > 596: debuginfo_node->add_req(call->in(res->_con)); I don't understand why you are selecting the input based on the result projection field `res->_con`? ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2401 From dongbo at openjdk.java.net Wed Feb 24 07:33:40 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 24 Feb 2021 07:33:40 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v10] In-Reply-To: References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> Message-ID: On Tue, 23 Feb 2021 17:13:35 GMT, Andrew Haley wrote: > > Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions? > > I'm not seeing `sra` used anywhere. > > The problem I see with the tests is that the methods are large. This causes C2 to do a lot of spilling. Also, because the resuling code is intertwined and complex, it's very hard to debug. > > It would be far better to do something like this: > > ``` > void long_shift_add(long arrLongs[][], LongVector vba, LongVector vbb, int i) { > vba.add(vbb.lanewise(VectorOperators.LSHR, 37)).intoArray(arrLongs[op], 2 * i); > vba.add(vbb.lanewise(VectorOperators.LSHR, 64)).intoArray(arrLongs[op + 1], 2 * i); > vba.add(vbb.lanewise(VectorOperators.LSHR, 99)).intoArray(arrLongs[op + 2], 2 * i); > vba.add(vbb.lanewise(VectorOperators.LSHR, 128)).intoArray(arrLongs[op + 3], 2 * i); > vba.add(vbb.lanewise(VectorOperators.LSHR, 157)).intoArray(arrLongs[op + 4], 2 * i); > vba.add(vbb.lanewise(VectorOperators.LSHR, 192)).intoArray(arrLongs[op + 5], 2 * i); > } > ``` Weird, I took a look at the the assembly, `ssra` did accessed by the tests on our server: $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64 &> assembly_vlen64.txt $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128 &> assembly_vlen128.txt $ cat assembly_vlen*.txt | grep "ssra" 02c0 ssra V18, V17, #37 # vector (2D) 02c8 ssra V19, V17, #0 # vector (2D) 02d0 ssra V20, V17, #35 # vector (2D) 0308 ssra V18, V17, #29 # vector (2D) 0644 ssra V18, V17, #37 # vector (2D) 064c ssra V19, V17, #0 # vector (2D) 0654 ssra V20, V17, #35 # vector (2D) 0674 ssra V18, V17, #29 # vector (2D) 0798 ssra V18, V17, #37 # vector (2D) 07a0 ssra V19, V17, #0 # vector (2D) 07a8 ssra V20, V17, #35 # vector (2D) 07e0 ssra V18, V17, #29 # vector (2D) 0x0000ffff83f7e500: ssra v18.2d, v17.2d, #37 ;*aload_0 {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff83f7e510: ssra v20.2d, v17.2d, #35 ;*iand {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff83f7e548: ssra v18.2d, v17.2d, #29 ;*if_icmpne {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff83f7e884: ssra v18.2d, v17.2d, #37 0x0000ffff83f7e894: ssra v20.2d, v17.2d, #35 0x0000ffff83f7e8b4: ssra v18.2d, v17.2d, #29 0x0000ffff83f7e9d8: ssra v18.2d, v17.2d, #37 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff83f7e9e8: ssra v20.2d, v17.2d, #35 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff83f7ea20: ssra v18.2d, v17.2d, #29 ;*checkcast {reexecute=0 rethrow=0 return_oop=0} 284 ssra V18, V17, #9 # vector (4S) 28c ssra V19, V17, #0 # vector (4S) 294 ssra V20, V17, #15 # vector (4S) 0x0000ffff83f822c4: ssra v18.4s, v17.4s, #9 ;*invokedynamic {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff83f822d4: ssra v20.4s, v17.4s, #15 284 ssra V18, V17, #1 # vector (8H) 28c ssra V19, V17, #8 # vector (8H) ... Also injected error to `sshr+add` by: --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp @@ -545,7 +545,7 @@ public: #define WRAP(INSN) \ void INSN(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, int shift) { \ if (shift == 0) { \ - Assembler::addv(Vd, T, Vd, Vn); \ + Assembler::subv(Vd, T, Vd, Vn); \ } else { \ Assembler::INSN(Vd, T, Vn, shift); \ } \ The `shift+add` tests failed as expected: $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64 WARNING: Using incubator modules: jdk.incubator.vector warning: using incubating module(s): jdk.incubator.vector 1 warning Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests: type SHORT index 19, operation ASHR_AND_ACCUMULATE, vector length 64. type SHORT index 21, operation ASHR_AND_ACCUMULATE, vector length 64. type SHORT index 23, operation ASHR_AND_ACCUMULATE, vector length 64. type SHORT index 25, operation LSHR_AND_ACCUMULATE, vector length 64. type SHORT index 27, operation LSHR_AND_ACCUMULATE, vector length 64. type SHORT index 29, operation LSHR_AND_ACCUMULATE, vector length 64. type INTEGER index 19, operation ASHR_AND_ACCUMULATE, vector length 64. type INTEGER index 21, operation ASHR_AND_ACCUMULATE, vector length 64. type INTEGER index 23, operation ASHR_AND_ACCUMULATE, vector length 64. type INTEGER index 25, operation LSHR_AND_ACCUMULATE, vector length 64. type INTEGER index 27, operation LSHR_AND_ACCUMULATE, vector length 64. type INTEGER index 29, operation LSHR_AND_ACCUMULATE, vector length 64. ... $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128 WARNING: Using incubator modules: jdk.incubator.vector warning: using incubating module(s): jdk.incubator.vector 1 warning Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests: type LONG index 49, operation ASHR_AND_ACCUMULATE, vector length 128. type LONG index 51, operation ASHR_AND_ACCUMULATE, vector length 128. type LONG index 53, operation ASHR_AND_ACCUMULATE, vector length 128. type LONG index 55, operation LSHR_AND_ACCUMULATE, vector length 128. type LONG index 57, operation LSHR_AND_ACCUMULATE, vector length 128. type LONG index 59, operation LSHR_AND_ACCUMULATE, vector length 128. type SHORT index 49, operation ASHR_AND_ACCUMULATE, vector length 128. type SHORT index 51, operation ASHR_AND_ACCUMULATE, vector length 128. type SHORT index 53, operation ASHR_AND_ACCUMULATE, vector length 128. ... Anyway, I extracted operations you suggested into `shift_op_*` methods. Performed the error-injected experiments with the new tests on Kunpeng916 and re-checked the assembly output, results looks good. The test command I used to run the newest tests are: $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=64 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen64.txt $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=128 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen128.txt $ cat assembly_vlen64.txt | grep ssra; cat assembly_vlen128.txt | grep ssra ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From alanb at openjdk.java.net Wed Feb 24 08:54:42 2021 From: alanb at openjdk.java.net (Alan Bateman) Date: Wed, 24 Feb 2021 08:54:42 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:38:03 GMT, ?????? ??????? wrote: >> Non-static classes hold a link to their parent classes, which in many cases can be avoided. > > ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: > > 8261880: Remove static from declarations of Holder nested classes src/java.base/windows/classes/sun/nio/ch/PipeImpl.java line 67: > 65: private final SinkChannel sink; > 66: > 67: private static class Initializer This one is okay to do. src/java.base/share/classes/jdk/internal/module/ServicesCatalog.java line 51: > 49: * Represents a service provider in the services catalog. > 50: */ > 51: public static final class ServiceProvider { This one is okay to do. ------------- PR: https://git.openjdk.java.net/jdk/pull/2589 From xliu at openjdk.java.net Wed Feb 24 09:50:04 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 24 Feb 2021 09:50:04 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: References: Message-ID: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set add comments and hoist ResourceMark ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2178/files - new: https://git.openjdk.java.net/jdk/pull/2178/files/6df63fe1..aeff9ecc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=06-07 Stats: 10 lines in 1 file changed: 1 ins; 1 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2178.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2178/head:pull/2178 PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Wed Feb 24 09:50:06 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 24 Feb 2021 09:50:06 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v7] In-Reply-To: References: Message-ID: <8TpdqWzAgwVycyBRzDWioTkDyoQ56DjsSmlaSf6Aqu0=.dd797e69-fed2-42ba-b2e6-22de2e83312d@github.com> On Wed, 24 Feb 2021 06:50:51 GMT, Tobias Hartmann wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> use the existing api StringUtils::replace_no_expand to archive the same replace. >> don't need to invoke os::strdup because stringStream::as_string() has duplicated >> the internal buffer. > > src/hotspot/share/opto/type.cpp line 4052: > >> 4050: >> 4051: { >> 4052: ResourceMark rm; > > Shouldn't the `ResourceMark` go to before `stringStream ss` which is a `ResourceObj` as well? Also, please add a small comment explaining that this code suppresses the new line emitted by `print_oop`. hi, @TobiHartmann Thank you for reviewing this PR. stringStream allocates its dynamic buffer using `NEW_C_HEAP_ARRAY`. IMHO, it's okay without a ResourceMark. Unlike `stringStream::grow`, stringStream::as_string(false) does use `NEW_RESOURCE_ARRAY`, which allocates an array on current thread's resource_area. That's why I put ResourceMark in a syntax scope. Actually, the current code still works even without that ResourceMark. It's because `Type::dump_on()` has declared a rm. Let me hoist ResourceMark as you said. That makes code straight-forward locally and I shouldn't assume its context. > src/hotspot/share/opto/type.cpp line 4040: > >> 4038: // Dump oop Type >> 4039: #ifndef PRODUCT >> 4040: void TypeInstPtr::dump2( Dict &d, uint depth, outputStream* st ) const { > > While you are at it, please also remove the excess whitespace after `(` and before `)`. done. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From github.com+10482586+therealeliu at openjdk.java.net Wed Feb 24 09:52:43 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Wed, 24 Feb 2021 09:52:43 GMT Subject: Integrated: 8256438: AArch64: Implement match rules with ROR shift register value In-Reply-To: References: Message-ID: <4-4Q7-eJ3buKNVIYzCDip8NbTuAizZrw4RiB625P63E=.dd8a578b-a802-42c2-afb2-f13ac47a7ee3@github.com> On Mon, 21 Dec 2020 10:10:10 GMT, Eric Liu wrote: > This patch transforms '(x >>> rshift) + (x << lshift)' into > 'RotateRight(x, rshift)' during GVN phase when both the shift exponents > are constants and their sum equals to the number of bits for the type > of shift base. > > This patch implements some new match rules for AArch64 instructions > which can take ROR as the optional shift. Such instructions are 'and', > 'or', 'eor', 'eon', 'bic' and 'orn'. > > ror w11, w2, #5 > eor w0, w1, w11 > > With this patch, above code could be optimized to below: > > eor w0, w1, w2, ror #5 > > Finally, the patch refactors TestRotate.java[1][2]. > > Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps, > jdk::jdk_core, langtools::tier1. > > [1] https://bugs.openjdk.java.net/browse/JDK-8252776 > [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039911.html This pull request has now been integrated. Changeset: 382e38dd Author: Eric Liu Committer: Ningsheng Jian URL: https://git.openjdk.java.net/jdk/commit/382e38dd Stats: 1067 lines in 5 files changed: 742 ins; 24 del; 301 mod 8256438: AArch64: Implement match rules with ROR shift register value Reviewed-by: aph, roland ------------- PR: https://git.openjdk.java.net/jdk/pull/1858 From thartmann at openjdk.java.net Wed Feb 24 10:08:41 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 24 Feb 2021 10:08:41 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: References: Message-ID: <3aukTZwD74tFD4GptC7mbqAnVCghVEK0PJJiz88opXI=.59055500-e950-4744-84c5-a4fd3695a858@github.com> On Wed, 24 Feb 2021 09:50:04 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > add comments and hoist ResourceMark That looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2178 From aph at redhat.com Wed Feb 24 10:55:04 2021 From: aph at redhat.com (Andrew Haley) Date: Wed, 24 Feb 2021 10:55:04 +0000 Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v10] In-Reply-To: References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> Message-ID: <3ff503da-bae1-45e3-fc29-06ff8c2bd8ff@redhat.com> On 24/02/2021 07:33, Dong Bo wrote: > Weird, I took a look at the the assembly, `ssra` did accessed by the tests on our server: I don't doubt it, but the test code is so very complex that it can fall foul of heuristics given slightly changed circumstances. That's why good test cases are as simple as possible, and allow no room for variations because they do only one thing. Precise targeting should be the goal of HotSpot back-end test cases. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From github.com+42899633+eastig at openjdk.java.net Wed Feb 24 12:13:43 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 24 Feb 2021 12:13:43 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: References: Message-ID: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> On Wed, 24 Feb 2021 09:50:04 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > add comments and hoist ResourceMark Changes requested by eastig at github.com (no known OpenJDK username). src/hotspot/share/opto/type.cpp line 4053: > 4051: const_oop()->print_oop(&ss); > 4052: // suppress new-lines('\n') in ss emitted by const_oop->print_oop() > 4053: // so each node is one-liner for -XX:+Verbose && -XX:+PrintIdeal What about rewriting the comment in clearer way: // 'const_oop->print_oop()' emits new-lines('\n') into ss. // For -XX:+Verbose && -XX:+PrintIdeal, new-lines('\n') must be removed from // the ss created string to have a node per line. test/hotspot/gtest/utilities/test_ostream.cpp line 66: > 64: > 65: static size_t count_char(const stringStream* ss, char ch) { > 66: return count_char(ss->as_string(), ss->size(), ch); Am I correct `std:count` is not allowed? No need to use `as_string`: `return count_char(ss->base(), ss->size(), ch);` Or as `stringStream` is always zero-terminated: `return count_char(ss->base(), ch);` test/hotspot/gtest/utilities/test_ostream.cpp line 72: > 70: ResourceMark rm; > 71: size_t whitespaces = count_char(ss, ' '); > 72: char* s2 = ss->as_string(false); No need of `false` because `false` is the default value of `as_string`. If you want to be explicit here, I recommend: `char* s2 = ss->as_string(/* c_heap= */ false);` test/hotspot/gtest/utilities/test_ostream.cpp line 63: > 61: } > 62: return cnt; > 63: } As the function is only used for zero-terminated strings, maybe it makes sense to use this property: static size_t count_char(const char* s, char ch) { size_t cnt = 0; while (*s != '\0') { if (*s++ == ch) { ++cnt; } } return cnt; } test/hotspot/gtest/utilities/test_ostream.cpp line 69: > 67: } > 68: > 69: static void test_stringStream_tr_delete(stringStream* ss) { I think this is a unit test for `StringUtils::replace_no_expand`. It checks that the function can be used to remove substrings. There is no dependency on `stringStream`. Any string can be used. Could you please move the test to `test_stringUtils.cpp`? ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From thartmann at openjdk.java.net Wed Feb 24 12:51:46 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 24 Feb 2021 12:51:46 GMT Subject: RFR: 8262299: C2 compilation fails with "modified node was not processed by IGVN.transform_old()" Message-ID: We hit an assert because a dead `MergeMemNode` was not removed. Make sure it's added to the IGVN worklist to give it a chance to be removed. Thanks, Tobias ------------- Commit messages: - 8262299: C2 compilation fails with "modified node was not processed by IGVN.transform_old()" Changes: https://git.openjdk.java.net/jdk/pull/2705/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2705&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262299 Stats: 10 lines in 1 file changed: 6 ins; 4 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2705.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2705/head:pull/2705 PR: https://git.openjdk.java.net/jdk/pull/2705 From roland at openjdk.java.net Wed Feb 24 12:56:39 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 24 Feb 2021 12:56:39 GMT Subject: RFR: 8262299: C2 compilation fails with "modified node was not processed by IGVN.transform_old()" In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 12:44:34 GMT, Tobias Hartmann wrote: > We hit an assert because a dead `MergeMemNode` was not removed. Make sure it's added to the IGVN worklist to give it a chance to be removed. > > Thanks, > Tobias Looks good to me ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2705 From zgu at openjdk.java.net Wed Feb 24 12:57:39 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 24 Feb 2021 12:57:39 GMT Subject: RFR: 8262259: Remove unused variable in MethodLiveness::BasicBlock::compute_gen_kill_single In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 04:56:39 GMT, Thomas Stuefe wrote: >> Please review this trivial patch that removes unused/dead local variable. > > Seems fine and trivial. > ..Thomas Thanks, @tstuefe @TobiHartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2698 From zgu at openjdk.java.net Wed Feb 24 12:57:41 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 24 Feb 2021 12:57:41 GMT Subject: Integrated: 8262259: Remove unused variable in MethodLiveness::BasicBlock::compute_gen_kill_single In-Reply-To: References: Message-ID: <3r0uMCJEB8ZBnPX0FZLfu2M9RR1WlAyLmbkAHEcmijQ=.13c83fda-31a3-479f-afd2-7a38659ee4e5@github.com> On Tue, 23 Feb 2021 21:52:31 GMT, Zhengyu Gu wrote: > Please review this trivial patch that removes unused/dead local variable. This pull request has now been integrated. Changeset: 8c07063d Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/8c07063d Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod 8262259: Remove unused variable in MethodLiveness::BasicBlock::compute_gen_kill_single Reviewed-by: stuefe, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2698 From chagedorn at openjdk.java.net Wed Feb 24 13:35:45 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 24 Feb 2021 13:35:45 GMT Subject: RFR: 8262299: C2 compilation fails with "modified node was not processed by IGVN.transform_old()" In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 12:44:34 GMT, Tobias Hartmann wrote: > We hit an assert because a dead `MergeMemNode` was not removed. Make sure it's added to the IGVN worklist to give it a chance to be removed. > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2705 From roland at openjdk.java.net Wed Feb 24 14:59:46 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 24 Feb 2021 14:59:46 GMT Subject: RFR: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed In-Reply-To: References: Message-ID: On Tue, 23 Feb 2021 19:13:22 GMT, Vladimir Kozlov wrote: >> The inner counted loop of the test case starts at 1 and stops at 1 so >> runs for one iteration. A counted loop is created for it. The iv Phi >> is found to be the constant 1 and its type is set by: >> >> l->phi()->as_Phi()->set_type(l->phi()->Value(&_igvn)); >> >> in PhaseIdealLoop::is_counted_loop() but it's not replaced by the >> constant 1 yet so the counted loop's shape is preserved. >> >> IdealLoopTree::do_one_iteration_loop() runs but doesn't optimize the >> loop because the trip count is not set to 1. The loop contains a range >> check and range check elimination is applied. That causes the loop >> exit test to be adjusted with a MinI(..) expression. When IGVN runs >> next, the phi is replaced with 1 but because the exit test was >> changed, IGVN can't prove it always fails. So the loop is not removed >> which causes the assert failure as loop opts progress. >> >> The fix I propose is for IdealLoopTree::do_one_iteration_loop() to >> remove the 1 iteration loop. The reason it doesn't happen is that >> IdealLoopTree::compute_trip_count() doesn't set the trip count because >> it finds a zero trip count: limit - init = 1 - 1 = 0. All loops, once >> entered execute at least once. So I think, it's safe to set the trip >> count to 1 in those cases. > > src/hotspot/share/opto/loopTransform.cpp line 126: > >> 124: jlong limit_con = (stride_con > 0) ? limit_type->_hi : limit_type->_lo; >> 125: int stride_m = stride_con - (stride_con > 0 ? 1 : -1); >> 126: jlong trip_count = (limit_con - init_con + stride_m)/stride_con; > > Does it mean that before we execute do_one_iteration_loop() optimization for loops which do 2 trips? > This seems wrong. Can you check? > I agree that loop body executed at least once since it is exit check. But it means we have to generally adjust `trip_count` with `+1` and not do what you suggested. Thanks for the comments. A CountedLoopNode/CountedLoopEndNode is this in pseudo code: int i = init; do { i += stride; } while (i < limit); for stride > 0 We always enter the loop body so execute it at least once. If limit > init, it's executed: (limit - init) / stride times if (limit - init) is a multiple of stride. if (limit - init) is not a multiple of stride then it's executed (limit - init) / stride + 1. That matches the formula in the source code AFAICT. In what case do you think we compute 1 for the trip count when it's actually 2? How would you compute the trip count? ------------- PR: https://git.openjdk.java.net/jdk/pull/2529 From thartmann at openjdk.java.net Wed Feb 24 15:25:39 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 24 Feb 2021 15:25:39 GMT Subject: RFR: 8262299: C2 compilation fails with "modified node was not processed by IGVN.transform_old()" In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 13:33:19 GMT, Christian Hagedorn wrote: >> We hit an assert because a dead `MergeMemNode` was not removed. Make sure it's added to the IGVN worklist to give it a chance to be removed. >> >> Thanks, >> Tobias > > Looks good. Roland, Christian, thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/2705 From roland at openjdk.java.net Wed Feb 24 15:26:54 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 24 Feb 2021 15:26:54 GMT Subject: RFR: 8261914: IfNode::fold_compares_helper faces non-canonicalized bool when running JRuby JSON workload Message-ID: The assert fires because the current IfNode's condition was not canonicalized when it's being folded with a dominating IfNode. idealize_test() should have taken care of that because it executed before fold_compares() but it didn't because: Node* new_b = phase->transform( new BoolNode(b->in(1), bt.negate()) ); if( !new_b->is_Bool() ) return NULL; caused it to bail out. new_b is a constant. This happens because of the order in which nodes are processed by IGVN. The If's current Bool would also constant fold but it's in the IGVN worklist and hasn't been processed yet. The fix I propose is to keep Aleksey's defensive fix but to check that the Bool input is indeed about to be transformed by IGVN and that that would cause the IfNode to be reprocessed. I tried to write a test case but didn't succeed. The 2 If nodes come from a tableswitch that's transformed into a series of If based on profile data. I couldn't reproduce the profile data with a simple test case. ------------- Commit messages: - fix Changes: https://git.openjdk.java.net/jdk/pull/2707/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2707&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261914 Stats: 9 lines in 1 file changed: 3 ins; 4 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2707.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2707/head:pull/2707 PR: https://git.openjdk.java.net/jdk/pull/2707 From roland at openjdk.java.net Wed Feb 24 16:06:05 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 24 Feb 2021 16:06:05 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker [v3] In-Reply-To: References: Message-ID: > We spotted this issue with Shenandoah and I managed to write a simple > test case that reproduces it reliably with Shenandoah but the issue is > independent of the GC. > > The loop in the test case calls a native invoker with an oop live in > rbp. rbp is saved in the native invoker stub's frame. A safepoint is > triggered from the safepoint check in the native invoker. The stack > walking code sees that rbp contains an oop but can't find where that > oop is stored. That's because stack walking updates the caller's frame > with the location of rbp in the callee on calls to > frame::sender(). But the current code sets the last java frame to be > the compiled frame where rbp is live. So there's no call to > frame::sender() to update the location rbp. The fix I propose is that > the frame of the native invoker be visible by stack walking. On a > safepoint, stack walking starts from the native invoker thread, then > calls frame::sender() to move to the compiled frame. That causes rbp > to be properly recorded with its location in the native invoker frame. > > Same problem affects both x86 and aarch64. I've tested this patch with: > > make run-test TEST="java/foreign" TEST_VM_OPTS="-Xcomp" JTREG="TIMEOUT_FACTOR=10" > > on both platforms. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - improved test - cleanup - Merge branch 'master' into JDK-8259937 - test & debug - broken build - whitespaces - fix & test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2528/files - new: https://git.openjdk.java.net/jdk/pull/2528/files/5b9dfff7..cef05b6f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2528&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2528&range=01-02 Stats: 18656 lines in 546 files changed: 11988 ins; 3826 del; 2842 mod Patch: https://git.openjdk.java.net/jdk/pull/2528.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2528/head:pull/2528 PR: https://git.openjdk.java.net/jdk/pull/2528 From roland at openjdk.java.net Wed Feb 24 16:13:00 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 24 Feb 2021 16:13:00 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker [v2] In-Reply-To: <6NJDW9Yzgj-U0tRiPS1dscN7i3E27kdKuUDdREMA8lo=.81cd0ead-af88-4af3-bcf3-6025cff8f855@github.com> References: <6NJDW9Yzgj-U0tRiPS1dscN7i3E27kdKuUDdREMA8lo=.81cd0ead-af88-4af3-bcf3-6025cff8f855@github.com> Message-ID: On Tue, 23 Feb 2021 12:59:44 GMT, Jorn Vernee wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> broken build > > LGTM, but I've left some suggestions in line. > > Thanks for cleaning up the frame layout code. Having a fixed frame is much better than repeatedly modifying the stack pointer. > > I've also done some downstream stress testing with jextract, and everything works as expected. @JornVernee thanks for the review and the comments. The new commits should address them (cb9dd24 was pushed by accident and reverted). ------------- PR: https://git.openjdk.java.net/jdk/pull/2528 From roland at openjdk.java.net Wed Feb 24 16:12:59 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 24 Feb 2021 16:12:59 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker [v4] In-Reply-To: References: Message-ID: <5M1DPgw-V-j9acOvHLewPoTaxF-a6-9HMIpIzHjqavU=.55831da6-f888-4dc0-beee-19869c5eb638@github.com> > We spotted this issue with Shenandoah and I managed to write a simple > test case that reproduces it reliably with Shenandoah but the issue is > independent of the GC. > > The loop in the test case calls a native invoker with an oop live in > rbp. rbp is saved in the native invoker stub's frame. A safepoint is > triggered from the safepoint check in the native invoker. The stack > walking code sees that rbp contains an oop but can't find where that > oop is stored. That's because stack walking updates the caller's frame > with the location of rbp in the callee on calls to > frame::sender(). But the current code sets the last java frame to be > the compiled frame where rbp is live. So there's no call to > frame::sender() to update the location rbp. The fix I propose is that > the frame of the native invoker be visible by stack walking. On a > safepoint, stack walking starts from the native invoker thread, then > calls frame::sender() to move to the compiled frame. That causes rbp > to be properly recorded with its location in the native invoker frame. > > Same problem affects both x86 and aarch64. I've tested this patch with: > > make run-test TEST="java/foreign" TEST_VM_OPTS="-Xcomp" JTREG="TIMEOUT_FACTOR=10" > > on both platforms. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Revert "test & debug" This reverts commit cb9dd24c9fcccc6997e9fca874e2860f966b9576. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2528/files - new: https://git.openjdk.java.net/jdk/pull/2528/files/cef05b6f..9f80616f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2528&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2528&range=02-03 Stats: 72 lines in 1 file changed: 0 ins; 72 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2528.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2528/head:pull/2528 PR: https://git.openjdk.java.net/jdk/pull/2528 From kvn at openjdk.java.net Wed Feb 24 18:13:38 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 24 Feb 2021 18:13:38 GMT Subject: RFR: 8261914: IfNode::fold_compares_helper faces non-canonicalized bool when running JRuby JSON workload In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 15:22:26 GMT, Roland Westrelin wrote: > The assert fires because the current IfNode's condition was not > canonicalized when it's being folded with a dominating > IfNode. idealize_test() should have taken care of that because it > executed before fold_compares() but it didn't because: > > Node* new_b = phase->transform( new BoolNode(b->in(1), bt.negate()) ); > if( !new_b->is_Bool() ) return NULL; > > caused it to bail out. new_b is a constant. This happens because of > the order in which nodes are processed by IGVN. The If's current Bool > would also constant fold but it's in the IGVN worklist and hasn't been > processed yet. > > The fix I propose is to keep Aleksey's defensive fix but to check that > the Bool input is indeed about to be transformed by IGVN and that that > would cause the IfNode to be reprocessed. > > I tried to write a test case but didn't succeed. The 2 If nodes come > from a tableswitch that's transformed into a series of If based on > profile data. I couldn't reproduce the profile data with a simple test > case. okay ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2707 From kvn at openjdk.java.net Wed Feb 24 19:35:47 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 24 Feb 2021 19:35:47 GMT Subject: RFR: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed In-Reply-To: References: Message-ID: <3QstFQo1-rnl899oN0deMIsmsV1g1DVZyf0_C5iijc4=.416e32d0-9e04-4e2f-b9c6-ea337efbdf39@github.com> On Wed, 24 Feb 2021 14:57:15 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopTransform.cpp line 126: >> >>> 124: jlong limit_con = (stride_con > 0) ? limit_type->_hi : limit_type->_lo; >>> 125: int stride_m = stride_con - (stride_con > 0 ? 1 : -1); >>> 126: jlong trip_count = (limit_con - init_con + stride_m)/stride_con; >> >> Does it mean that before we execute do_one_iteration_loop() optimization for loops which do 2 trips? >> This seems wrong. Can you check? >> I agree that loop body executed at least once since it is exit check. But it means we have to generally adjust `trip_count` with `+1` and not do what you suggested. > > Thanks for the comments. > A CountedLoopNode/CountedLoopEndNode is this in pseudo code: > int i = init; > do { > i += stride; > } while (i < limit); > for stride > 0 > We always enter the loop body so execute it at least once. > If limit > init, it's executed: (limit - init) / stride times if (limit - init) is a multiple of stride. > if (limit - init) is not a multiple of stride then it's executed (limit - init) / stride + 1. > That matches the formula in the source code AFAICT. > In what case do you think we compute 1 for the trip count when it's actually 2? > How would you compute the trip count? I looked on history and before your 8256655 changes we did not create counted loop for such case: https://github.com/openjdk/jdk/blob/f504f419d3b377f0ccfd458026a2b57a9704bdff/src/hotspot/share/opto/loopnode.cpp#L1376 That is why we did not hit this issue before. After 8256655 we allow counted loops with `init >= limit`. Got it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2529 From kvn at openjdk.java.net Wed Feb 24 19:45:40 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 24 Feb 2021 19:45:40 GMT Subject: RFR: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 16:11:50 GMT, Roland Westrelin wrote: > The inner counted loop of the test case starts at 1 and stops at 1 so > runs for one iteration. A counted loop is created for it. The iv Phi > is found to be the constant 1 and its type is set by: > > l->phi()->as_Phi()->set_type(l->phi()->Value(&_igvn)); > > in PhaseIdealLoop::is_counted_loop() but it's not replaced by the > constant 1 yet so the counted loop's shape is preserved. > > IdealLoopTree::do_one_iteration_loop() runs but doesn't optimize the > loop because the trip count is not set to 1. The loop contains a range > check and range check elimination is applied. That causes the loop > exit test to be adjusted with a MinI(..) expression. When IGVN runs > next, the phi is replaced with 1 but because the exit test was > changed, IGVN can't prove it always fails. So the loop is not removed > which causes the assert failure as loop opts progress. > > The fix I propose is for IdealLoopTree::do_one_iteration_loop() to > remove the 1 iteration loop. The reason it doesn't happen is that > IdealLoopTree::compute_trip_count() doesn't set the trip count because > it finds a zero trip count: limit - init = 1 - 1 = 0. All loops, once > entered execute at least once. So I think, it's safe to set the trip > count to 1 in those cases. src/hotspot/share/opto/loopTransform.cpp line 127: > 125: int stride_m = stride_con - (stride_con > 0 ? 1 : -1); > 126: jlong trip_count = (limit_con - init_con + stride_m)/stride_con; > 127: trip_count = MAX2(trip_count, (jlong)1); Add comment here explaining this case (one trip when init >= limit). BTW, this optimization seems only works for INT iv loops and not LONG. Do you plan to implement for LONG? ------------- PR: https://git.openjdk.java.net/jdk/pull/2529 From richard.reingruber at sap.com Wed Feb 24 20:03:41 2021 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Wed, 24 Feb 2021 20:03:41 +0000 Subject: RFC: 8262295: C2: Out-of-Bounds Array Load from Clone Source Message-ID: Hi, I've been working on a fix for JDK-8262295: C2: Out-of-Bounds Array Load from Clone Source https://bugs.openjdk.java.net/browse/JDK-8262295 Now I'm not sure if the fix I found (compile time range check) is a good one. I have created a draft PR with the fix not yet ready for proper review: https://github.com/openjdk/jdk/pull/2708 Please let me know what you think and if this is the right approach. My first attempt was to keep the original control for the cloned node in LoadNode::can_see_arraycopy_value() but this seemd to confuse loop optimizations. At least the test compiler/escapeAnalysis/TestMissingAntiDependency.java failed with that change. Another potential fix could be adding a runtime range check. What do you think? Thanks, Richard. From jvernee at openjdk.java.net Wed Feb 24 20:59:40 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Wed, 24 Feb 2021 20:59:40 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker [v4] In-Reply-To: <5M1DPgw-V-j9acOvHLewPoTaxF-a6-9HMIpIzHjqavU=.55831da6-f888-4dc0-beee-19869c5eb638@github.com> References: <5M1DPgw-V-j9acOvHLewPoTaxF-a6-9HMIpIzHjqavU=.55831da6-f888-4dc0-beee-19869c5eb638@github.com> Message-ID: On Wed, 24 Feb 2021 16:12:59 GMT, Roland Westrelin wrote: >> We spotted this issue with Shenandoah and I managed to write a simple >> test case that reproduces it reliably with Shenandoah but the issue is >> independent of the GC. >> >> The loop in the test case calls a native invoker with an oop live in >> rbp. rbp is saved in the native invoker stub's frame. A safepoint is >> triggered from the safepoint check in the native invoker. The stack >> walking code sees that rbp contains an oop but can't find where that >> oop is stored. That's because stack walking updates the caller's frame >> with the location of rbp in the callee on calls to >> frame::sender(). But the current code sets the last java frame to be >> the compiled frame where rbp is live. So there's no call to >> frame::sender() to update the location rbp. The fix I propose is that >> the frame of the native invoker be visible by stack walking. On a >> safepoint, stack walking starts from the native invoker thread, then >> calls frame::sender() to move to the compiled frame. That causes rbp >> to be properly recorded with its location in the native invoker frame. >> >> Same problem affects both x86 and aarch64. I've tested this patch with: >> >> make run-test TEST="java/foreign" TEST_VM_OPTS="-Xcomp" JTREG="TIMEOUT_FACTOR=10" >> >> on both platforms. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Revert "test & debug" > > This reverts commit cb9dd24c9fcccc6997e9fca874e2860f966b9576. Thanks for addressing the comments! Looks good. ------------- Marked as reviewed by jvernee (Committer). PR: https://git.openjdk.java.net/jdk/pull/2528 From github.com+16932759+shqking at openjdk.java.net Thu Feb 25 01:24:56 2021 From: github.com+16932759+shqking at openjdk.java.net (Hao Sun) Date: Thu, 25 Feb 2021 01:24:56 GMT Subject: RFR: 8257137: Revise smov and umov in aarch64 assembler [v3] In-Reply-To: References: Message-ID: <42b0Skv5uZiAb7kKG3MutpokVFBM1Zf9vzM2A1no7MI=.9d345d82-519c-4a6a-8511-b8978e9d9fe1@github.com> > 1. Both smov and umov lack of checking the register type validity. > Register type must be 'B', 'H' or 'S' for smov [1]. > Register type can NOT be 'Q' for umov [2]. > Such checks are added. > > 2. smov and umov have different explanations on 'Q' field, i.e. bit 30 > of the insturction, but current assembler implementation mixed it up. > For umov, 'Q' field can only be set when register type 'D' is given > [2]. However, this field of smov must be set for register type 'S' > [1], that is, 'Q' field can be optional for register type 'B' or 'H'. > > Current implementation only took the umov scenario into account. As a > result, runtime error ILL_ILLOPN would occur if 'smov(Register, > FloatRegister, S, index)' is used. > > We put them into two separate functions and make 'Q' field always set > for smov. That means 'SMOVX' (64-bit register variant) is generated > for all cases since it's compatible with our current usages of 'SMOVW'. > Existing usages of smov have been double checked and this patch does > not affect them. > > 3. Smoke tests are also added. > > [1]. https://developer.arm.com/docs/ddi0602/f/simd-and-floating-point-instructions-alphabetic-order/smov-signed-move-vector-element-to-general-purpose-register > [2]. https://developer.arm.com/docs/ddi0602/f/simd-and-floating-point-instructions-alphabetic-order/umov-unsigned-move-vector-element-to-general-purpose-register > > > Note that Jtreg tier1 and jdk::tier3 have been tested and all tests passed without new failures. Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into smov Temporally keep file test/hotspot/gtest/aarch64/asmtest.out.h same with branch mastger. - Update the copyright notice to 2021 Update the copyright notice to 2021. Change-Id: I7fe485e768ccac45a4861cd3c061aedc13fef579 CustomizedGitHooks: yes - 8257137: Revise smov and umov in aarch64 assembler. 1. Both smov and umov lack of checking the register type validity. Register type must be 'B', 'H' or 'S' for smov [1]. Register type can NOT be 'Q' for umov [2]. Such checks are added. 2. smov and umov have different explanations on 'Q' field, i.e. bit 30 of the insturction, but current assembler implementation mixed it up. For umov, 'Q' field can only be set when register type 'D' is given [2]. However, this field of smov must be set for register type 'S' [1], that is, 'Q' field can be optional for register type 'B' or 'H'. Current implementation only took the umov scenario into account. As a result, runtime error ILL_ILLOPN would occur if 'smov(Register, FloatRegister, S, index)' is used. We put them into two separate functions and make 'Q' field always set for smov. That means 'SMOVX' (64-bit register variant) is generated for all cases since it's compatible with our current usages of 'SMOVW'. Existing usages of smov have been double checked and this patch does not affect them. 3. Smoke tests are also added. [1]. https://developer.arm.com/docs/ddi0602/f/simd-and-floating-point-instructions-alphabetic-order/smov-signed-move-vector-element-to-general-purpose-register [2]. https://developer.arm.com/docs/ddi0602/f/simd-and-floating-point-instructions-alphabetic-order/umov-unsigned-move-vector-element-to-general-purpose-register ------------- Changes: https://git.openjdk.java.net/jdk/pull/1586/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1586&range=02 Stats: 29 lines in 3 files changed: 18 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/1586.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1586/head:pull/1586 PR: https://git.openjdk.java.net/jdk/pull/1586 From github.com+16932759+shqking at openjdk.java.net Thu Feb 25 01:27:52 2021 From: github.com+16932759+shqking at openjdk.java.net (Hao Sun) Date: Thu, 25 Feb 2021 01:27:52 GMT Subject: RFR: 8257137: Revise smov and umov in aarch64 assembler [v4] In-Reply-To: References: Message-ID: > 1. Both smov and umov lack of checking the register type validity. > Register type must be 'B', 'H' or 'S' for smov [1]. > Register type can NOT be 'Q' for umov [2]. > Such checks are added. > > 2. smov and umov have different explanations on 'Q' field, i.e. bit 30 > of the insturction, but current assembler implementation mixed it up. > For umov, 'Q' field can only be set when register type 'D' is given > [2]. However, this field of smov must be set for register type 'S' > [1], that is, 'Q' field can be optional for register type 'B' or 'H'. > > Current implementation only took the umov scenario into account. As a > result, runtime error ILL_ILLOPN would occur if 'smov(Register, > FloatRegister, S, index)' is used. > > We put them into two separate functions and make 'Q' field always set > for smov. That means 'SMOVX' (64-bit register variant) is generated > for all cases since it's compatible with our current usages of 'SMOVW'. > Existing usages of smov have been double checked and this patch does > not affect them. > > 3. Smoke tests are also added. > > [1]. https://developer.arm.com/docs/ddi0602/f/simd-and-floating-point-instructions-alphabetic-order/smov-signed-move-vector-element-to-general-purpose-register > [2]. https://developer.arm.com/docs/ddi0602/f/simd-and-floating-point-instructions-alphabetic-order/umov-unsigned-move-vector-element-to-general-purpose-register > > > Note that Jtreg tier1 and jdk::tier3 have been tested and all tests passed without new failures. Hao Sun has updated the pull request incrementally with one additional commit since the last revision: Code style: add spaces between operands add spaces between operands ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1586/files - new: https://git.openjdk.java.net/jdk/pull/1586/files/ba69f500..b624a6b4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1586&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1586&range=02-03 Stats: 75 lines in 2 files changed: 2 ins; 0 del; 73 mod Patch: https://git.openjdk.java.net/jdk/pull/1586.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1586/head:pull/1586 PR: https://git.openjdk.java.net/jdk/pull/1586 From dongbo at openjdk.java.net Thu Feb 25 01:47:40 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Thu, 25 Feb 2021 01:47:40 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v10] In-Reply-To: References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> Message-ID: <9rtmEMrsPaA73FDA-KB7H0S0CRdBePGwnI5FcDY-OLI=.425249e2-b590-4a16-b9b8-8d7b5ecd2800@github.com> On Wed, 24 Feb 2021 07:31:14 GMT, Dong Bo wrote: >>> Hi, @theRealAph >>> >>> I've rebased the tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg `test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java`. >>> Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions? >> >> I'm not seeing ```sra``` used anywhere. >> >> The problem I see with the tests is that the methods are large. This causes C2 to do a lot of spilling. Also, because the resuling code is intertwined and complex, it's very hard to debug. >> >> It would be far better to do something like this: >> void long_shift_add(long arrLongs[][], LongVector vba, LongVector vbb, int i) { >> vba.add(vbb.lanewise(VectorOperators.LSHR, 37)).intoArray(arrLongs[op], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 64)).intoArray(arrLongs[op + 1], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 99)).intoArray(arrLongs[op + 2], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 128)).intoArray(arrLongs[op + 3], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 157)).intoArray(arrLongs[op + 4], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 192)).intoArray(arrLongs[op + 5], 2 * i); >> } > >> > Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions? >> >> I'm not seeing `sra` used anywhere. >> >> The problem I see with the tests is that the methods are large. This causes C2 to do a lot of spilling. Also, because the resuling code is intertwined and complex, it's very hard to debug. >> >> It would be far better to do something like this: >> >> ``` >> void long_shift_add(long arrLongs[][], LongVector vba, LongVector vbb, int i) { >> vba.add(vbb.lanewise(VectorOperators.LSHR, 37)).intoArray(arrLongs[op], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 64)).intoArray(arrLongs[op + 1], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 99)).intoArray(arrLongs[op + 2], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 128)).intoArray(arrLongs[op + 3], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 157)).intoArray(arrLongs[op + 4], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 192)).intoArray(arrLongs[op + 5], 2 * i); >> } >> ``` > > > Weird, I took a look at the the assembly, `ssra` did accessed by the tests on our server: > $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64 &> assembly_vlen64.txt > $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128 &> assembly_vlen128.txt > $ cat assembly_vlen*.txt | grep "ssra" > 02c0 ssra V18, V17, #37 # vector (2D) > 02c8 ssra V19, V17, #0 # vector (2D) > 02d0 ssra V20, V17, #35 # vector (2D) > 0308 ssra V18, V17, #29 # vector (2D) > 0644 ssra V18, V17, #37 # vector (2D) > 064c ssra V19, V17, #0 # vector (2D) > 0654 ssra V20, V17, #35 # vector (2D) > 0674 ssra V18, V17, #29 # vector (2D) > 0798 ssra V18, V17, #37 # vector (2D) > 07a0 ssra V19, V17, #0 # vector (2D) > 07a8 ssra V20, V17, #35 # vector (2D) > 07e0 ssra V18, V17, #29 # vector (2D) > 0x0000ffff83f7e500: ssra v18.2d, v17.2d, #37 ;*aload_0 {reexecute=0 rethrow=0 return_oop=0} > 0x0000ffff83f7e510: ssra v20.2d, v17.2d, #35 ;*iand {reexecute=0 rethrow=0 return_oop=0} > 0x0000ffff83f7e548: ssra v18.2d, v17.2d, #29 ;*if_icmpne {reexecute=0 rethrow=0 return_oop=0} > 0x0000ffff83f7e884: ssra v18.2d, v17.2d, #37 > 0x0000ffff83f7e894: ssra v20.2d, v17.2d, #35 > 0x0000ffff83f7e8b4: ssra v18.2d, v17.2d, #29 > 0x0000ffff83f7e9d8: ssra v18.2d, v17.2d, #37 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0} > 0x0000ffff83f7e9e8: ssra v20.2d, v17.2d, #35 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0} > 0x0000ffff83f7ea20: ssra v18.2d, v17.2d, #29 ;*checkcast {reexecute=0 rethrow=0 return_oop=0} > 284 ssra V18, V17, #9 # vector (4S) > 28c ssra V19, V17, #0 # vector (4S) > 294 ssra V20, V17, #15 # vector (4S) > 0x0000ffff83f822c4: ssra v18.4s, v17.4s, #9 ;*invokedynamic {reexecute=0 rethrow=0 return_oop=0} > 0x0000ffff83f822d4: ssra v20.4s, v17.4s, #15 > 284 ssra V18, V17, #1 # vector (8H) > 28c ssra V19, V17, #8 # vector (8H) > ... > > Also injected error to `sshr+add` by: > --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp > +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp > @@ -545,7 +545,7 @@ public: > #define WRAP(INSN) \ > void INSN(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, int shift) { \ > if (shift == 0) { \ > - Assembler::addv(Vd, T, Vd, Vn); \ > + Assembler::subv(Vd, T, Vd, Vn); \ > } else { \ > Assembler::INSN(Vd, T, Vn, shift); \ > } \ > The `shift+add` tests failed as expected: > $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64 > WARNING: Using incubator modules: jdk.incubator.vector > warning: using incubating module(s): jdk.incubator.vector > 1 warning > Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests: > type SHORT index 19, operation ASHR_AND_ACCUMULATE, vector length 64. > type SHORT index 21, operation ASHR_AND_ACCUMULATE, vector length 64. > type SHORT index 23, operation ASHR_AND_ACCUMULATE, vector length 64. > type SHORT index 25, operation LSHR_AND_ACCUMULATE, vector length 64. > type SHORT index 27, operation LSHR_AND_ACCUMULATE, vector length 64. > type SHORT index 29, operation LSHR_AND_ACCUMULATE, vector length 64. > type INTEGER index 19, operation ASHR_AND_ACCUMULATE, vector length 64. > type INTEGER index 21, operation ASHR_AND_ACCUMULATE, vector length 64. > type INTEGER index 23, operation ASHR_AND_ACCUMULATE, vector length 64. > type INTEGER index 25, operation LSHR_AND_ACCUMULATE, vector length 64. > type INTEGER index 27, operation LSHR_AND_ACCUMULATE, vector length 64. > type INTEGER index 29, operation LSHR_AND_ACCUMULATE, vector length 64. > ... > $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128 > WARNING: Using incubator modules: jdk.incubator.vector > warning: using incubating module(s): jdk.incubator.vector > 1 warning > Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests: > type LONG index 49, operation ASHR_AND_ACCUMULATE, vector length 128. > type LONG index 51, operation ASHR_AND_ACCUMULATE, vector length 128. > type LONG index 53, operation ASHR_AND_ACCUMULATE, vector length 128. > type LONG index 55, operation LSHR_AND_ACCUMULATE, vector length 128. > type LONG index 57, operation LSHR_AND_ACCUMULATE, vector length 128. > type LONG index 59, operation LSHR_AND_ACCUMULATE, vector length 128. > type SHORT index 49, operation ASHR_AND_ACCUMULATE, vector length 128. > type SHORT index 51, operation ASHR_AND_ACCUMULATE, vector length 128. > type SHORT index 53, operation ASHR_AND_ACCUMULATE, vector length 128. > ... > > Anyway, I extracted operations you suggested into `shift_op_*` methods. > Performed the error-injected experiments with the new tests on Kunpeng916 and re-checked the assembly output, results looks good. > > The test command I used to run the newest tests are: > $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=64 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen64.txt > $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=128 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen128.txt > $ cat assembly_vlen64.txt | grep ssra; cat assembly_vlen128.txt | grep ssra > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ > > On 24/02/2021 07:33, Dong Bo wrote: > > > Weird, I took a look at the the assembly, `ssra` did accessed by the tests on our server: > > I don't doubt it, but the test code is so very complex that it can > fall foul of heuristics given slightly changed circumstances. That's > why good test cases are as simple as possible, and allow no room for > variations because they do only one thing. Precise targeting should > be the goal of HotSpot back-end test cases. > Understood, thanks. :-) Does the newest version address the considerations? I extracted the `shift`/`shift+add` operations into separate methods, mostly as suggested in previous comments, something like: static int shift_op_long_ASHR_and_ADD(LongVector vba, LongVector vbb, long arrLongs[][], int end, int ind) { vba.add(vbb.lanewise(VectorOperators.ASHR, 37)).intoArray(arrLongs[end++], ind); vba.add(vbb.lanewise(VectorOperators.ASHR, 64)).intoArray(arrLongs[end++], ind); vba.add(vbb.lanewise(VectorOperators.ASHR, 99)).intoArray(arrLongs[end++], ind); vba.add(vbb.lanewise(VectorOperators.ASHR, 128)).intoArray(arrLongs[end++], ind); vba.add(vbb.lanewise(VectorOperators.ASHR, 157)).intoArray(arrLongs[end++], ind); vba.add(vbb.lanewise(VectorOperators.ASHR, 192)).intoArray(arrLongs[end++], ind); return end; } ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From jiefu at openjdk.java.net Thu Feb 25 03:59:38 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 25 Feb 2021 03:59:38 GMT Subject: RFR: 8262096: Vector API fails to work without C2 In-Reply-To: References: Message-ID: <3OcNdppJRXH5uv4sL0o6QTdb2-tKoY28mUQ-JHiYWsc=.105484b2-628e-40ce-850f-e7b0696ab2a3@github.com> On Sun, 21 Feb 2021 23:19:52 GMT, Jie Fu wrote: > Hi all, > > Vector API won't work without C2. > The reason is that VectorSupport_GetMaxLaneCount [1] always returns -1 if C2 is not present. > But it should work even there is no JIT compiler since it's Java-level's api. > So let's fix it. > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 After more thinking about this issue, I've found more bugs which need to be fixed. So close this PR and re-do it. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2667 From jiefu at openjdk.java.net Thu Feb 25 03:59:39 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 25 Feb 2021 03:59:39 GMT Subject: Withdrawn: 8262096: Vector API fails to work without C2 In-Reply-To: References: Message-ID: <0LsLDuQkWpLdda9B6V7Pl-tM0g8qpfMtSl5DVB282pE=.b0eed0bd-f4c2-41e0-9587-035af80044d7@github.com> On Sun, 21 Feb 2021 23:19:52 GMT, Jie Fu wrote: > Hi all, > > Vector API won't work without C2. > The reason is that VectorSupport_GetMaxLaneCount [1] always returns -1 if C2 is not present. > But it should work even there is no JIT compiler since it's Java-level's api. > So let's fix it. > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/2667 From shade at openjdk.java.net Thu Feb 25 07:09:39 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 25 Feb 2021 07:09:39 GMT Subject: RFR: 8261914: IfNode::fold_compares_helper faces non-canonicalized bool when running JRuby JSON workload In-Reply-To: References: Message-ID: <5qtq9Zb2tHryna_1rSVbuqWe2pjcQ1Eex50oFH2ug4E=.de32d1a5-44dd-42bc-88c9-e09044899c65@github.com> On Wed, 24 Feb 2021 15:22:26 GMT, Roland Westrelin wrote: > The assert fires because the current IfNode's condition was not > canonicalized when it's being folded with a dominating > IfNode. idealize_test() should have taken care of that because it > executed before fold_compares() but it didn't because: > > Node* new_b = phase->transform( new BoolNode(b->in(1), bt.negate()) ); > if( !new_b->is_Bool() ) return NULL; > > caused it to bail out. new_b is a constant. This happens because of > the order in which nodes are processed by IGVN. The If's current Bool > would also constant fold but it's in the IGVN worklist and hasn't been > processed yet. > > The fix I propose is to keep Aleksey's defensive fix but to check that > the Bool input is indeed about to be transformed by IGVN and that that > would cause the IfNode to be reprocessed. > > I tried to write a test case but didn't succeed. The 2 If nodes come > from a tableswitch that's transformed into a series of If based on > profile data. I couldn't reproduce the profile data with a simple test > case. Marked as reviewed by shade (Reviewer). src/hotspot/share/opto/ifnode.cpp line 984: > 982: return false; > 983: } > 984: assert(this_bool->_test.is_less() && !fail->_con, "incorrect test"); This should lead with "this test was canonicalized" comment? Missed during the move, I think. ------------- PR: https://git.openjdk.java.net/jdk/pull/2707 From shade at openjdk.java.net Thu Feb 25 07:09:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 25 Feb 2021 07:09:40 GMT Subject: RFR: 8261914: IfNode::fold_compares_helper faces non-canonicalized bool when running JRuby JSON workload In-Reply-To: <5qtq9Zb2tHryna_1rSVbuqWe2pjcQ1Eex50oFH2ug4E=.de32d1a5-44dd-42bc-88c9-e09044899c65@github.com> References: <5qtq9Zb2tHryna_1rSVbuqWe2pjcQ1Eex50oFH2ug4E=.de32d1a5-44dd-42bc-88c9-e09044899c65@github.com> Message-ID: On Thu, 25 Feb 2021 07:02:45 GMT, Aleksey Shipilev wrote: >> The assert fires because the current IfNode's condition was not >> canonicalized when it's being folded with a dominating >> IfNode. idealize_test() should have taken care of that because it >> executed before fold_compares() but it didn't because: >> >> Node* new_b = phase->transform( new BoolNode(b->in(1), bt.negate()) ); >> if( !new_b->is_Bool() ) return NULL; >> >> caused it to bail out. new_b is a constant. This happens because of >> the order in which nodes are processed by IGVN. The If's current Bool >> would also constant fold but it's in the IGVN worklist and hasn't been >> processed yet. >> >> The fix I propose is to keep Aleksey's defensive fix but to check that >> the Bool input is indeed about to be transformed by IGVN and that that >> would cause the IfNode to be reprocessed. >> >> I tried to write a test case but didn't succeed. The 2 If nodes come >> from a tableswitch that's transformed into a series of If based on >> profile data. I couldn't reproduce the profile data with a simple test >> case. > > src/hotspot/share/opto/ifnode.cpp line 984: > >> 982: return false; >> 983: } >> 984: assert(this_bool->_test.is_less() && !fail->_con, "incorrect test"); > > This should lead with "this test was canonicalized" comment? Missed during the move, I think. I also find it a bit weird to even have the assert on this path, as we tested all cases in the if-chain before, and the only path to this assert is through `lt` and `le` -- which is `is_less`? Maybe I am missing something, though. ------------- PR: https://git.openjdk.java.net/jdk/pull/2707 From xliu at openjdk.java.net Thu Feb 25 08:43:41 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 25 Feb 2021 08:43:41 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> References: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> Message-ID: <-2ATme5P-5tNlwomRglS6fjxrvD5tK-bKqMx9N11eZg=.e5e0d183-6872-4972-bb0d-7542b3c0b057@github.com> On Wed, 24 Feb 2021 11:10:46 GMT, Evgeny Astigeevich wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> add comments and hoist ResourceMark > > test/hotspot/gtest/utilities/test_ostream.cpp line 66: > >> 64: >> 65: static size_t count_char(const stringStream* ss, char ch) { >> 66: return count_char(ss->as_string(), ss->size(), ch); > > Am I correct `std:count` is not allowed? > No need to use `as_string`: `return count_char(ss->base(), ss->size(), ch);` > Or as `stringStream` is always zero-terminated: `return count_char(ss->base(), ch);` I don't think STL is allowed. Make sense. ss->as_string() is not necessary. I don't like the idea we assume ss is always zero-terminated like C-string. There is a member variable _written in class stringStream. Technically speaking, the implementation can avoid from writing '\0' in the end. that's why I would like to use len argument. For me, `count_char(ss->base(), ss->size(), ch)` is more reliable because it depends on interfaces instead of implementation. an interface is supposed to be more stable than implementation. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Thu Feb 25 08:48:40 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 25 Feb 2021 08:48:40 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> References: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> Message-ID: On Wed, 24 Feb 2021 12:09:23 GMT, Evgeny Astigeevich wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> add comments and hoist ResourceMark > > test/hotspot/gtest/utilities/test_ostream.cpp line 69: > >> 67: } >> 68: >> 69: static void test_stringStream_tr_delete(stringStream* ss) { > > I think this is a unit test for `StringUtils::replace_no_expand`. It checks that the function can be used to remove substrings. There is no dependency on `stringStream`. Any string can be used. > Could you please move the test to `test_stringUtils.cpp`? yes, let me move it to test_stringUtils.cpp. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Thu Feb 25 08:51:39 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 25 Feb 2021 08:51:39 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: References: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> Message-ID: On Thu, 25 Feb 2021 08:46:08 GMT, Xin Liu wrote: >> test/hotspot/gtest/utilities/test_ostream.cpp line 69: >> >>> 67: } >>> 68: >>> 69: static void test_stringStream_tr_delete(stringStream* ss) { >> >> I think this is a unit test for `StringUtils::replace_no_expand`. It checks that the function can be used to remove substrings. There is no dependency on `stringStream`. Any string can be used. >> Could you please move the test to `test_stringUtils.cpp`? > > yes, let me move it to test_stringUtils.cpp. I would like to keep stringStream because I think it's good idea to test the similar scenario. it's also handy to do memory management. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Thu Feb 25 08:56:14 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 25 Feb 2021 08:56:14 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v9] In-Reply-To: References: Message-ID: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set update comments based on the review feedbacks. move the unittest to test_stringUtil.cpp. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2178/files - new: https://git.openjdk.java.net/jdk/pull/2178/files/aeff9ecc..2f7ccdb0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=07-08 Stats: 76 lines in 3 files changed: 41 ins; 30 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2178.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2178/head:pull/2178 PR: https://git.openjdk.java.net/jdk/pull/2178 From thartmann at openjdk.java.net Thu Feb 25 08:56:46 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 25 Feb 2021 08:56:46 GMT Subject: Integrated: 8262299: C2 compilation fails with "modified node was not processed by IGVN.transform_old()" In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 12:44:34 GMT, Tobias Hartmann wrote: > We hit an assert because a dead `MergeMemNode` was not removed. Make sure it's added to the IGVN worklist to give it a chance to be removed. > > Thanks, > Tobias This pull request has now been integrated. Changeset: a83e802b Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/a83e802b Stats: 10 lines in 1 file changed: 6 ins; 4 del; 0 mod 8262299: C2 compilation fails with "modified node was not processed by IGVN.transform_old()" Reviewed-by: roland, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/2705 From jiefu at openjdk.java.net Thu Feb 25 09:36:02 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 25 Feb 2021 09:36:02 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception Message-ID: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> Hi all, Vector API fails to work when: - case 1: MaxVectorSize is set to <=8, or - case 2: C2 is disabled The reason is that {max/preferred} VectorShape initialization fails in both cases. And the root cause is that VectorSupport_GetMaxLaneCount [1] returns unreasonable values (0 for case 1 and -1 for case 2). Vector API should not depend on C2 to run. It should work even there is no JIT compiler since it's a Java-level api. So let's fix it. Testing: - jdk/incubator/vector with -XX:MaxVectorSize=default/8 on Linux/x64 Thanks. Best regards, Jie [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 ------------- Commit messages: - 8262096: Vector API fails to work due to VectorShape initialization exception Changes: https://git.openjdk.java.net/jdk/pull/2722/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2722&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262096 Stats: 66 lines in 3 files changed: 57 ins; 6 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2722.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2722/head:pull/2722 PR: https://git.openjdk.java.net/jdk/pull/2722 From roland at openjdk.java.net Thu Feb 25 09:52:40 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 25 Feb 2021 09:52:40 GMT Subject: RFR: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 19:43:14 GMT, Vladimir Kozlov wrote: > Add comment here explaining this case (one trip when init >= limit). Ok. Do you still think we need extra tests? > BTW, this optimization seems only works for INT iv loops and not LONG. Do you plan to implement for LONG? I thought about it but it's not straightforward because the current code uses long to avoid overflow. I would also like to rework the parallel iv code so it works with LONG loops. I don't have time right now, thought. Note also, that because of the iv phi Value(), I would expect some of the single iteration loops to be optimized even without a working IdealLoopTree::compute_trip_count(). That would be true for in and long loops. ------------- PR: https://git.openjdk.java.net/jdk/pull/2529 From roland at openjdk.java.net Thu Feb 25 09:53:39 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 25 Feb 2021 09:53:39 GMT Subject: RFR: 8261914: IfNode::fold_compares_helper faces non-canonicalized bool when running JRuby JSON workload In-Reply-To: References: <5qtq9Zb2tHryna_1rSVbuqWe2pjcQ1Eex50oFH2ug4E=.de32d1a5-44dd-42bc-88c9-e09044899c65@github.com> Message-ID: On Thu, 25 Feb 2021 07:06:44 GMT, Aleksey Shipilev wrote: > I also find it a bit weird to even have the assert on this path, as we tested all cases in the if-chain before, and the only path to this assert is through `lt` and `le` -- which is `is_less`? Maybe I am missing something, though. I kept it because of the !fail->_con part of the assert. I kept the whole assert and move it around because I'm lazy, I guess. ------------- PR: https://git.openjdk.java.net/jdk/pull/2707 From adinn at openjdk.java.net Thu Feb 25 11:04:07 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 25 Feb 2021 11:04:07 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker [v4] In-Reply-To: References: <5M1DPgw-V-j9acOvHLewPoTaxF-a6-9HMIpIzHjqavU=.55831da6-f888-4dc0-beee-19869c5eb638@github.com> Message-ID: On Wed, 24 Feb 2021 20:57:10 GMT, Jorn Vernee wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "test & debug" >> >> This reverts commit cb9dd24c9fcccc6997e9fca874e2860f966b9576. > > Thanks for addressing the comments! Looks good. @JornVernee I'm not clear that your response addresses my point. I'm concerned that a thread stack dump reported by serviceability code may contain an extra frame for the stub call. This could occur while the Java thread is still in native and it could also include the case wher the native call re-enters into Java i.e. the extra frame could appear at the top of the stack dump or interleaved between Java method frames. I don't see how that problem is mitigated by your suggestion that this only relates to Panama API use. Code which consumes any such stack dump (incluing 3rd party code) that might be affected by the presence of this extra frame will not care (or even be aware) that the native callout is a Panama call. Anyway, since no one from the serviceability team has noted this as a potential problem I'm ok to see the patch proceed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2528 From github.com+42899633+eastig at openjdk.java.net Thu Feb 25 11:04:07 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 25 Feb 2021 11:04:07 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v9] In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 08:56:14 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > update comments based on the review feedbacks. > move the unittest to test_stringUtil.cpp. Marked as reviewed by eastig at github.com (no known OpenJDK username). ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From github.com+10835776+stsypanov at openjdk.java.net Thu Feb 25 11:07:47 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Thu, 25 Feb 2021 11:07:47 GMT Subject: RFR: 8261880: Change nested classes in java.base to static nested classes where possible [v2] In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 08:50:36 GMT, Alan Bateman wrote: >> ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: >> >> 8261880: Remove static from declarations of Holder nested classes > > src/java.base/windows/classes/sun/nio/ch/PipeImpl.java line 67: > >> 65: private final SinkChannel sink; >> 66: >> 67: private static class Initializer > > This one is okay to do. Thanks for review! Could you review the rest of the code and approve this PR, if it's fine? ------------- PR: https://git.openjdk.java.net/jdk/pull/2589 From rwestrel at redhat.com Thu Feb 25 11:55:04 2021 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 25 Feb 2021 12:55:04 +0100 Subject: RFC: 8262295: C2: Out-of-Bounds Array Load from Clone Source In-Reply-To: References: Message-ID: <87o8g8weef.fsf@redhat.com> Hi Richard, > I've been working on a fix for > > JDK-8262295: C2: Out-of-Bounds Array Load from Clone Source > https://bugs.openjdk.java.net/browse/JDK-8262295 The bug is not visible. > Now I'm not sure if the fix I found (compile time range check) is a good one. > > I have created a draft PR with the fix not yet ready for proper review: > > https://github.com/openjdk/jdk/pull/2708 > > Please let me know what you think and if this is the right approach. > > My first attempt was to keep the original control for the cloned node in > LoadNode::can_see_arraycopy_value() but this seemd to confuse loop > optimizations. At least the test compiler/escapeAnalysis/TestMissingAntiDependency.java > failed with that change. Keeping the original control would seem like the best fix. What error do you get with it? Maybe keeping the cloned load's memory unchanged helps? Roland. From shade at openjdk.java.net Thu Feb 25 14:52:39 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 25 Feb 2021 14:52:39 GMT Subject: RFR: 8261914: IfNode::fold_compares_helper faces non-canonicalized bool when running JRuby JSON workload In-Reply-To: References: <5qtq9Zb2tHryna_1rSVbuqWe2pjcQ1Eex50oFH2ug4E=.de32d1a5-44dd-42bc-88c9-e09044899c65@github.com> Message-ID: On Thu, 25 Feb 2021 09:50:27 GMT, Roland Westrelin wrote: >> I also find it a bit weird to even have the assert on this path, as we tested all cases in the if-chain before, and the only path to this assert is through `lt` and `le` -- which is `is_less`? Maybe I am missing something, though. > >> I also find it a bit weird to even have the assert on this path, as we tested all cases in the if-chain before, and the only path to this assert is through `lt` and `le` -- which is `is_less`? Maybe I am missing something, though. > > I kept it because of the !fail->_con part of the assert. I kept the whole assert and move it around because I'm lazy, I guess. That's fine. Keep the comment, though? ------------- PR: https://git.openjdk.java.net/jdk/pull/2707 From psandoz at openjdk.java.net Thu Feb 25 17:25:40 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 25 Feb 2021 17:25:40 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception In-Reply-To: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> Message-ID: <8uVq41r6szj6ZIS823AKYsvm1ja29A-YY_3fo7RCyGQ=.e790ca0e-99ca-48e1-b1e1-fee731bdc2c8@github.com> On Thu, 25 Feb 2021 09:31:01 GMT, Jie Fu wrote: > Hi all, > > Vector API fails to work when: > - case 1: MaxVectorSize is set to <=8, or > - case 2: C2 is disabled > > The reason is that {max/preferred} VectorShape initialization fails in both cases. > And the root cause is that VectorSupport_GetMaxLaneCount [1] returns unreasonable values (0 for case 1 and -1 for case 2). > > Vector API should not depend on C2 to run. > It should work even there is no JIT compiler since it's a Java-level api. > So let's fix it. > > Testing: > - jdk/incubator/vector with -XX:MaxVectorSize=default/8 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 src/hotspot/share/prims/vectorSupport.cpp line 368: > 366: if (java_lang_Class::is_primitive(mirror)) { > 367: BasicType bt = java_lang_Class::primitive_type(mirror); > 368: int min_lane_count = 64 / type2aelembytes(bt); I am uncertain of the units here. Is the numerator in bits and the denominator in bytes? ------------- PR: https://git.openjdk.java.net/jdk/pull/2722 From xliu at openjdk.java.net Thu Feb 25 17:50:54 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 25 Feb 2021 17:50:54 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v9] In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 10:40:07 GMT, Evgeny Astigeevich wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> update comments based on the review feedbacks. >> move the unittest to test_stringUtil.cpp. > > Marked as reviewed by eastig at github.com (no known OpenJDK username). @eastig Thank you for reviewing it. @TobiHartmann Could you take a look at it again? I made a little change after you approve it. If everything looks fine, could you sponsor it? Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From kvn at openjdk.java.net Thu Feb 25 18:11:53 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 25 Feb 2021 18:11:53 GMT Subject: RFR: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 16:11:50 GMT, Roland Westrelin wrote: > The inner counted loop of the test case starts at 1 and stops at 1 so > runs for one iteration. A counted loop is created for it. The iv Phi > is found to be the constant 1 and its type is set by: > > l->phi()->as_Phi()->set_type(l->phi()->Value(&_igvn)); > > in PhaseIdealLoop::is_counted_loop() but it's not replaced by the > constant 1 yet so the counted loop's shape is preserved. > > IdealLoopTree::do_one_iteration_loop() runs but doesn't optimize the > loop because the trip count is not set to 1. The loop contains a range > check and range check elimination is applied. That causes the loop > exit test to be adjusted with a MinI(..) expression. When IGVN runs > next, the phi is replaced with 1 but because the exit test was > changed, IGVN can't prove it always fails. So the loop is not removed > which causes the assert failure as loop opts progress. > > The fix I propose is for IdealLoopTree::do_one_iteration_loop() to > remove the 1 iteration loop. The reason it doesn't happen is that > IdealLoopTree::compute_trip_count() doesn't set the trip count because > it finds a zero trip count: limit - init = 1 - 1 = 0. All loops, once > entered execute at least once. So I think, it's safe to set the trip > count to 1 in those cases. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2529 From kvn at openjdk.java.net Thu Feb 25 18:11:53 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 25 Feb 2021 18:11:53 GMT Subject: RFR: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 09:49:31 GMT, Roland Westrelin wrote: > > Add comment here explaining this case (one trip when init >= limit). > > Ok. Do you still think we need extra tests? Yes, would be nice to have tests for 0,1,2 iterations and all 3 types of loops: `for() {}`, `while() {}`, `do {} while()` to verify that they are converted to straight code. It could be done as separate RFE. > > > BTW, this optimization seems only works for INT iv loops and not LONG. Do you plan to implement for LONG? > > I thought about it but it's not straightforward because the current code uses long to avoid overflow. I would also like to rework the parallel iv code so it works with LONG loops. I don't have time right now, thought. > > Note also, that because of the iv phi Value(), I would expect some of the single iteration loops to be optimized even without a working IdealLoopTree::compute_trip_count(). That would be true for in and long loops. Got it. Thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/2529 From xliu at openjdk.java.net Thu Feb 25 20:19:41 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 25 Feb 2021 20:19:41 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> References: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> Message-ID: <7u7ZuQQ8gEBsAFXs8ZR6JNQWvHmyzzkGPfN0gez9ZmI=.79e2b654-c552-4fbf-9f2a-e35ae70a2e6b@github.com> On Wed, 24 Feb 2021 11:08:58 GMT, Evgeny Astigeevich wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> add comments and hoist ResourceMark > > src/hotspot/share/opto/type.cpp line 4053: > >> 4051: const_oop()->print_oop(&ss); >> 4052: // suppress new-lines('\n') in ss emitted by const_oop->print_oop() >> 4053: // so each node is one-liner for -XX:+Verbose && -XX:+PrintIdeal > > What about rewriting the comment in clearer way: > // 'const_oop->print_oop()' emits new-lines('\n') into ss. > // For -XX:+Verbose && -XX:+PrintIdeal, new-lines('\n') must be removed from > // the ss created string to have a node per line. update it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From jiefu at openjdk.java.net Thu Feb 25 23:52:06 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 25 Feb 2021 23:52:06 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v2] In-Reply-To: <8uVq41r6szj6ZIS823AKYsvm1ja29A-YY_3fo7RCyGQ=.e790ca0e-99ca-48e1-b1e1-fee731bdc2c8@github.com> References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> <8uVq41r6szj6ZIS823AKYsvm1ja29A-YY_3fo7RCyGQ=.e790ca0e-99ca-48e1-b1e1-fee731bdc2c8@github.com> Message-ID: On Thu, 25 Feb 2021 17:23:11 GMT, Paul Sandoz wrote: >> Jie Fu has updated the pull request incrementally with one additional commit since the last revision: >> >> The numerator should be 8 (byte) > > src/hotspot/share/prims/vectorSupport.cpp line 368: > >> 366: if (java_lang_Class::is_primitive(mirror)) { >> 367: BasicType bt = java_lang_Class::primitive_type(mirror); >> 368: int min_lane_count = 64 / type2aelembytes(bt); > > I am uncertain of the units here. Is the numerator in bits and the denominator in bytes? Thanks Paul for your review. Oops, the numerator should be 8 (bytes). Updated. Testing of jdk/incubator/vector (MaxVector=default/8/4 or without C2) is still fine. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2722 From jiefu at openjdk.java.net Thu Feb 25 23:52:04 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 25 Feb 2021 23:52:04 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v2] In-Reply-To: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> Message-ID: > Hi all, > > Vector API fails to work when: > - case 1: MaxVectorSize is set to <=8, or > - case 2: C2 is disabled > > The reason is that {max/preferred} VectorShape initialization fails in both cases. > And the root cause is that VectorSupport_GetMaxLaneCount [1] returns unreasonable values (0 for case 1 and -1 for case 2). > > Vector API should not depend on C2 to run. > It should work even there is no JIT compiler since it's a Java-level api. > So let's fix it. > > Testing: > - jdk/incubator/vector with -XX:MaxVectorSize=default/8 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 Jie Fu has updated the pull request incrementally with one additional commit since the last revision: The numerator should be 8 (byte) ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2722/files - new: https://git.openjdk.java.net/jdk/pull/2722/files/09b49f25..724b36d4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2722&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2722&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2722.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2722/head:pull/2722 PR: https://git.openjdk.java.net/jdk/pull/2722 From psandoz at openjdk.java.net Fri Feb 26 00:33:41 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Fri, 26 Feb 2021 00:33:41 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v2] In-Reply-To: References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> <8uVq41r6szj6ZIS823AKYsvm1ja29A-YY_3fo7RCyGQ=.e790ca0e-99ca-48e1-b1e1-fee731bdc2c8@github.com> Message-ID: On Thu, 25 Feb 2021 23:48:37 GMT, Jie Fu wrote: >> src/hotspot/share/prims/vectorSupport.cpp line 368: >> >>> 366: if (java_lang_Class::is_primitive(mirror)) { >>> 367: BasicType bt = java_lang_Class::primitive_type(mirror); >>> 368: int min_lane_count = 64 / type2aelembytes(bt); >> >> I am uncertain of the units here. Is the numerator in bits and the denominator in bytes? > > Thanks Paul for your review. > > Oops, the numerator should be 8 (bytes). > > Updated. > Testing of jdk/incubator/vector (MaxVector=default/8/4 or without C2) is still fine. > Thanks. Thanks, was the test `VectorShapeInitTest` passing prior to the fix of the numerator? Perhaps we should be testing more directly on `VectorShape.S_Max_BIT.vectorBitSize()` and `VectorShape.preferredShape` ? Also, perhaps we can run with C2 disabled, with `-XX:TieredStopAtLevel=3` ? ------------- PR: https://git.openjdk.java.net/jdk/pull/2722 From jiefu at openjdk.java.net Fri Feb 26 02:19:01 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 26 Feb 2021 02:19:01 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v3] In-Reply-To: References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> <8uVq41r6szj6ZIS823AKYsvm1ja29A-YY_3fo7RCyGQ=.e790ca0e-99ca-48e1-b1e1-fee731bdc2c8@github.com> Message-ID: On Fri, 26 Feb 2021 00:30:45 GMT, Paul Sandoz wrote: > Thanks, was the test `VectorShapeInitTest` passing prior to the fix of the numerator? Yes. It also passed with min_lane_count = 64 / type2aelembytes(bt). > Perhaps we should be testing more directly on `VectorShape.S_Max_BIT.vectorBitSize()` and `VectorShape.preferredShape` ? Ok. But it that case, I'd like to just re-use jdk/incubator/vector/PreferredSpeciesTest.java. > Also, perhaps we can run with C2 disabled, with `-XX:TieredStopAtLevel=3` ? Fine. Although this bug wouldn't be triggered with -XX:TieredStopAtLevel=x, it can help test with C1. Patch had been updated. Any comments? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2722 From jiefu at openjdk.java.net Fri Feb 26 02:19:01 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 26 Feb 2021 02:19:01 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v3] In-Reply-To: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> Message-ID: > Hi all, > > Vector API fails to work when: > - case 1: MaxVectorSize is set to <=8, or > - case 2: C2 is disabled > > The reason is that {max/preferred} VectorShape initialization fails in both cases. > And the root cause is that VectorSupport_GetMaxLaneCount [1] returns unreasonable values (0 for case 1 and -1 for case 2). > > Vector API should not depend on C2 to run. > It should work even there is no JIT compiler since it's a Java-level api. > So let's fix it. > > Testing: > - jdk/incubator/vector with -XX:MaxVectorSize=default/8 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Update the jtreg test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2722/files - new: https://git.openjdk.java.net/jdk/pull/2722/files/724b36d4..aa475b0a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2722&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2722&range=01-02 Stats: 59 lines in 2 files changed: 10 ins; 48 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2722.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2722/head:pull/2722 PR: https://git.openjdk.java.net/jdk/pull/2722 From psandoz at openjdk.java.net Fri Feb 26 02:31:39 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Fri, 26 Feb 2021 02:31:39 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v3] In-Reply-To: References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> <8uVq41r6szj6ZIS823AKYsvm1ja29A-YY_3fo7RCyGQ=.e790ca0e-99ca-48e1-b1e1-fee731bdc2c8@github.com> Message-ID: On Fri, 26 Feb 2021 02:16:02 GMT, Jie Fu wrote: >> Thanks, was the test `VectorShapeInitTest` passing prior to the fix of the numerator? >> Perhaps we should be testing more directly on `VectorShape.S_Max_BIT.vectorBitSize()` and `VectorShape.preferredShape` ? >> Also, perhaps we can run with C2 disabled, with `-XX:TieredStopAtLevel=3` ? > >> Thanks, was the test `VectorShapeInitTest` passing prior to the fix of the numerator? > Yes. It also passed with min_lane_count = 64 / type2aelembytes(bt). > >> Perhaps we should be testing more directly on `VectorShape.S_Max_BIT.vectorBitSize()` and `VectorShape.preferredShape` ? > Ok. > But it that case, I'd like to just re-use jdk/incubator/vector/PreferredSpeciesTest.java. > >> Also, perhaps we can run with C2 disabled, with `-XX:TieredStopAtLevel=3` ? > Fine. > Although this bug wouldn't be triggered with -XX:TieredStopAtLevel=x, it can help test with C1. > > Patch had been updated. > Any comments? > Thanks. Reusing PreferredSpeciesTest is a good idea. I now realize that C2 needs to be compiled out to trigger some other cases. In that case I think we can remove the execution with `-XX:TieredStopAtLevel=x`. The test will get executed with various HotSpot configurations by the test infrastructure, eventually... Looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/2722 From jiefu at openjdk.java.net Fri Feb 26 02:38:00 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 26 Feb 2021 02:38:00 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v4] In-Reply-To: References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> <8uVq41r6szj6ZIS823AKYsvm1ja29A-YY_3fo7RCyGQ=.e790ca0e-99ca-48e1-b1e1-fee731bdc2c8@github.com> Message-ID: <6aOBlWYEmWHRYJ2YehyBI_AoE5_bA3Hejgdl2yIRt84=.fbffb4ec-3949-4993-8468-c39f5cde3f09@github.com> On Fri, 26 Feb 2021 02:28:55 GMT, Paul Sandoz wrote: > In that case I think we can remove the execution with `-XX:TieredStopAtLevel=x`. Fixed. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2722 From jiefu at openjdk.java.net Fri Feb 26 02:38:00 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 26 Feb 2021 02:38:00 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v4] In-Reply-To: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> Message-ID: > Hi all, > > Vector API fails to work when: > - case 1: MaxVectorSize is set to <=8, or > - case 2: C2 is disabled > > The reason is that {max/preferred} VectorShape initialization fails in both cases. > And the root cause is that VectorSupport_GetMaxLaneCount [1] returns unreasonable values (0 for case 1 and -1 for case 2). > > Vector API should not depend on C2 to run. > It should work even there is no JIT compiler since it's a Java-level api. > So let's fix it. > > Testing: > - jdk/incubator/vector with -XX:MaxVectorSize=default/8 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Remove -XX:TieredStopAtLevel=3 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2722/files - new: https://git.openjdk.java.net/jdk/pull/2722/files/aa475b0a..bbe6150c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2722&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2722&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2722.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2722/head:pull/2722 PR: https://git.openjdk.java.net/jdk/pull/2722 From psandoz at openjdk.java.net Fri Feb 26 03:19:41 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Fri, 26 Feb 2021 03:19:41 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v4] In-Reply-To: References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> Message-ID: On Fri, 26 Feb 2021 02:38:00 GMT, Jie Fu wrote: >> Hi all, >> >> Vector API fails to work when: >> - case 1: MaxVectorSize is set to <=8, or >> - case 2: C2 is disabled >> >> The reason is that {max/preferred} VectorShape initialization fails in both cases. >> And the root cause is that VectorSupport_GetMaxLaneCount [1] returns unreasonable values (0 for case 1 and -1 for case 2). >> >> Vector API should not depend on C2 to run. >> It should work even there is no JIT compiler since it's a Java-level api. >> So let's fix it. >> >> Testing: >> - jdk/incubator/vector with -XX:MaxVectorSize=default/8 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Remove -XX:TieredStopAtLevel=3 Marked as reviewed by psandoz (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2722 From dongbo at openjdk.java.net Fri Feb 26 06:10:01 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Fri, 26 Feb 2021 06:10:01 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v11] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: refactor tests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/9290f27e..24d6e9f8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=09-10 Stats: 422 lines in 1 file changed: 16 ins; 222 del; 184 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Fri Feb 26 06:13:39 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Fri, 26 Feb 2021 06:13:39 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v11] In-Reply-To: <9rtmEMrsPaA73FDA-KB7H0S0CRdBePGwnI5FcDY-OLI=.425249e2-b590-4a16-b9b8-8d7b5ecd2800@github.com> References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> <9rtmEMrsPaA73FDA-KB7H0S0CRdBePGwnI5FcDY-OLI=.425249e2-b590-4a16-b9b8-8d7b5ecd2800@github.com> Message-ID: On Thu, 25 Feb 2021 01:44:27 GMT, Dong Bo wrote: >>> > Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions? >>> >>> I'm not seeing `sra` used anywhere. >>> >>> The problem I see with the tests is that the methods are large. This causes C2 to do a lot of spilling. Also, because the resuling code is intertwined and complex, it's very hard to debug. >>> >>> It would be far better to do something like this: >>> >>> ``` >>> void long_shift_add(long arrLongs[][], LongVector vba, LongVector vbb, int i) { >>> vba.add(vbb.lanewise(VectorOperators.LSHR, 37)).intoArray(arrLongs[op], 2 * i); >>> vba.add(vbb.lanewise(VectorOperators.LSHR, 64)).intoArray(arrLongs[op + 1], 2 * i); >>> vba.add(vbb.lanewise(VectorOperators.LSHR, 99)).intoArray(arrLongs[op + 2], 2 * i); >>> vba.add(vbb.lanewise(VectorOperators.LSHR, 128)).intoArray(arrLongs[op + 3], 2 * i); >>> vba.add(vbb.lanewise(VectorOperators.LSHR, 157)).intoArray(arrLongs[op + 4], 2 * i); >>> vba.add(vbb.lanewise(VectorOperators.LSHR, 192)).intoArray(arrLongs[op + 5], 2 * i); >>> } >>> ``` >> >> >> Weird, I took a look at the the assembly, `ssra` did accessed by the tests on our server: >> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64 &> assembly_vlen64.txt >> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128 &> assembly_vlen128.txt >> $ cat assembly_vlen*.txt | grep "ssra" >> 02c0 ssra V18, V17, #37 # vector (2D) >> 02c8 ssra V19, V17, #0 # vector (2D) >> 02d0 ssra V20, V17, #35 # vector (2D) >> 0308 ssra V18, V17, #29 # vector (2D) >> 0644 ssra V18, V17, #37 # vector (2D) >> 064c ssra V19, V17, #0 # vector (2D) >> 0654 ssra V20, V17, #35 # vector (2D) >> 0674 ssra V18, V17, #29 # vector (2D) >> 0798 ssra V18, V17, #37 # vector (2D) >> 07a0 ssra V19, V17, #0 # vector (2D) >> 07a8 ssra V20, V17, #35 # vector (2D) >> 07e0 ssra V18, V17, #29 # vector (2D) >> 0x0000ffff83f7e500: ssra v18.2d, v17.2d, #37 ;*aload_0 {reexecute=0 rethrow=0 return_oop=0} >> 0x0000ffff83f7e510: ssra v20.2d, v17.2d, #35 ;*iand {reexecute=0 rethrow=0 return_oop=0} >> 0x0000ffff83f7e548: ssra v18.2d, v17.2d, #29 ;*if_icmpne {reexecute=0 rethrow=0 return_oop=0} >> 0x0000ffff83f7e884: ssra v18.2d, v17.2d, #37 >> 0x0000ffff83f7e894: ssra v20.2d, v17.2d, #35 >> 0x0000ffff83f7e8b4: ssra v18.2d, v17.2d, #29 >> 0x0000ffff83f7e9d8: ssra v18.2d, v17.2d, #37 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0} >> 0x0000ffff83f7e9e8: ssra v20.2d, v17.2d, #35 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0} >> 0x0000ffff83f7ea20: ssra v18.2d, v17.2d, #29 ;*checkcast {reexecute=0 rethrow=0 return_oop=0} >> 284 ssra V18, V17, #9 # vector (4S) >> 28c ssra V19, V17, #0 # vector (4S) >> 294 ssra V20, V17, #15 # vector (4S) >> 0x0000ffff83f822c4: ssra v18.4s, v17.4s, #9 ;*invokedynamic {reexecute=0 rethrow=0 return_oop=0} >> 0x0000ffff83f822d4: ssra v20.4s, v17.4s, #15 >> 284 ssra V18, V17, #1 # vector (8H) >> 28c ssra V19, V17, #8 # vector (8H) >> ... >> >> Also injected error to `sshr+add` by: >> --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp >> +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp >> @@ -545,7 +545,7 @@ public: >> #define WRAP(INSN) \ >> void INSN(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, int shift) { \ >> if (shift == 0) { \ >> - Assembler::addv(Vd, T, Vd, Vn); \ >> + Assembler::subv(Vd, T, Vd, Vn); \ >> } else { \ >> Assembler::INSN(Vd, T, Vn, shift); \ >> } \ >> The `shift+add` tests failed as expected: >> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64 >> WARNING: Using incubator modules: jdk.incubator.vector >> warning: using incubating module(s): jdk.incubator.vector >> 1 warning >> Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests: >> type SHORT index 19, operation ASHR_AND_ACCUMULATE, vector length 64. >> type SHORT index 21, operation ASHR_AND_ACCUMULATE, vector length 64. >> type SHORT index 23, operation ASHR_AND_ACCUMULATE, vector length 64. >> type SHORT index 25, operation LSHR_AND_ACCUMULATE, vector length 64. >> type SHORT index 27, operation LSHR_AND_ACCUMULATE, vector length 64. >> type SHORT index 29, operation LSHR_AND_ACCUMULATE, vector length 64. >> type INTEGER index 19, operation ASHR_AND_ACCUMULATE, vector length 64. >> type INTEGER index 21, operation ASHR_AND_ACCUMULATE, vector length 64. >> type INTEGER index 23, operation ASHR_AND_ACCUMULATE, vector length 64. >> type INTEGER index 25, operation LSHR_AND_ACCUMULATE, vector length 64. >> type INTEGER index 27, operation LSHR_AND_ACCUMULATE, vector length 64. >> type INTEGER index 29, operation LSHR_AND_ACCUMULATE, vector length 64. >> ... >> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128 >> WARNING: Using incubator modules: jdk.incubator.vector >> warning: using incubating module(s): jdk.incubator.vector >> 1 warning >> Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests: >> type LONG index 49, operation ASHR_AND_ACCUMULATE, vector length 128. >> type LONG index 51, operation ASHR_AND_ACCUMULATE, vector length 128. >> type LONG index 53, operation ASHR_AND_ACCUMULATE, vector length 128. >> type LONG index 55, operation LSHR_AND_ACCUMULATE, vector length 128. >> type LONG index 57, operation LSHR_AND_ACCUMULATE, vector length 128. >> type LONG index 59, operation LSHR_AND_ACCUMULATE, vector length 128. >> type SHORT index 49, operation ASHR_AND_ACCUMULATE, vector length 128. >> type SHORT index 51, operation ASHR_AND_ACCUMULATE, vector length 128. >> type SHORT index 53, operation ASHR_AND_ACCUMULATE, vector length 128. >> ... >> >> Anyway, I extracted operations you suggested into `shift_op_*` methods. >> Performed the error-injected experiments with the new tests on Kunpeng916 and re-checked the assembly output, results looks good. >> >> The test command I used to run the newest tests are: >> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=64 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen64.txt >> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=128 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen128.txt >> $ cat assembly_vlen64.txt | grep ssra; cat assembly_vlen128.txt | grep ssra > >> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ >> >> On 24/02/2021 07:33, Dong Bo wrote: >> >> > Weird, I took a look at the the assembly, `ssra` did accessed by the tests on our server: >> >> I don't doubt it, but the test code is so very complex that it can >> fall foul of heuristics given slightly changed circumstances. That's >> why good test cases are as simple as possible, and allow no room for >> variations because they do only one thing. Precise targeting should >> be the goal of HotSpot back-end test cases. >> > > Understood, thanks. :-) > Does the newest version address the concern? > I extracted the `shift`/`shift+add` operations into separate methods, mostly as suggested in previous comments, something like: > static int shift_op_long_ASHR_and_ADD(LongVector vba, LongVector vbb, long arrLongs[][], int end, int ind) { > vba.add(vbb.lanewise(VectorOperators.ASHR, 37)).intoArray(arrLongs[end++], ind); > vba.add(vbb.lanewise(VectorOperators.ASHR, 64)).intoArray(arrLongs[end++], ind); > vba.add(vbb.lanewise(VectorOperators.ASHR, 99)).intoArray(arrLongs[end++], ind); > vba.add(vbb.lanewise(VectorOperators.ASHR, 128)).intoArray(arrLongs[end++], ind); > vba.add(vbb.lanewise(VectorOperators.ASHR, 157)).intoArray(arrLongs[end++], ind); > vba.add(vbb.lanewise(VectorOperators.ASHR, 192)).intoArray(arrLongs[end++], ind); > return end; > } > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ > > > I don't doubt it, but the test code is so very complex that it can > fall foul of heuristics given slightly changed circumstances. That's > why good test cases are as simple as possible, and allow no room for > variations because they do only one thing. Precise targeting should > be the goal of HotSpot back-end test cases. > Test updated, all the operations to test are put in overloaded functions, `shift_with_op` and `shift_with_op_and_add`, repeatly called and tested by `shift` and `shift_and_accumulate` repectively with a loop. Commands below are used to verify that `ssra` is accessed: $ cp hsdis-aarch64.so ./build/linux-aarch64-server-fastdebug/images/jdk/lib/ $ jtreg -verbose:all -J-Djavatest.maxOutputSize=50000000 test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java | grep ssra The tests do use `ssra` on our two different platforms, Kunpeng916 and Kunpeng920: 2b4 + ssra V16, V20, #17 # vector (2S) 2bc + ssra V17, V20, #0 # vector (2S) 2c4 + ssra V18, V20, #21 # vector (2S) 310 + ssra V16, V20, #12 # vector (2S) 0x0000ffff9894e0f4: ssra v16.2s, v20.2s, #17 ;*invokespecial fromArray0Template {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff9894e104: ssra v18.2s, v20.2s, #21 ;*invokestatic arrayAddress {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff9894e150: ssra v16.2s, v20.2s, #12 ;*checkcast {reexecute=0 rethrow=0 return_oop=0} 2b0 + ssra V16, V20, #9 # vector (4H) 2b8 + ssra V17, V20, #0 # vector (4H) 2c0 + ssra V18, V20, #11 # vector (4H) 0x0000ffff9894ae70: ssra v16.4h, v20.4h, #9 ;*invokevirtual vspecies {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff9894ae80: ssra v18.4h, v20.4h, #11 ;*invokestatic arrayAddress {reexecute=0 rethrow=0 return_oop=0} : # out( N1802 ) <- 298in( R29, + R25B70, ssra V16 ) #7 + a4c ssra V17 + , spill V20, #0 # vector (8B) mov ssra a4cV18R1 + , spill V20R11 -> [sp, #12], # spill size = 32, ssra V16, V20, d08#3 # vector (8B)B150 ... Any comments? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From vlivanov at openjdk.java.net Fri Feb 26 08:22:53 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 26 Feb 2021 08:22:53 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class [v2] In-Reply-To: References: Message-ID: <43tPmPoboooVq-05u97kPDJqwV7uesF8l5AmHy8kJwE=.b46992f1-eb9d-46a3-8048-04659d529da3@github.com> > Simplify `ClassHierarchyWalker::find_witness_anywhere()` which iterates over class hierarchy under context class searching for witnesses. > > Current implementation traverses the hierarchy in a breadth-first manner and keeps a stack-allocated array to keep a worklist. > But all the subclasses are already part of a singly linked list formed by `Klass::subklass()`/`next_sibling()`/`superklass()`. > > Proposed refactoring gets rid of the explicit worklist and switches to the traversal over the linked list (encapsulated into `ClassHierarchyIterator`). It performs depth-first pre-order hierarchy traversal. > > (There are some other minor refactorings applied in `ClassHierarchyWalker` along the way.) > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] additional verification that CHA decisions aren't affected Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Move ClassHierarchyIterator::next() into CPP file ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2630/files - new: https://git.openjdk.java.net/jdk/pull/2630/files/ae78e51e..7237e9c5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2630&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2630&range=00-01 Stats: 39 lines in 2 files changed: 21 ins; 17 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2630.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2630/head:pull/2630 PR: https://git.openjdk.java.net/jdk/pull/2630 From vlivanov at openjdk.java.net Fri Feb 26 08:22:54 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 26 Feb 2021 08:22:54 GMT Subject: RFR: 8261954: Dependencies: Improve iteration over class hierarchy under context class [v2] In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 15:52:32 GMT, Erik ?sterlund wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Move ClassHierarchyIterator::next() into CPP file > > Looks good! (move next() to cpp file if you want to) Thanks for the review, Erik. (Moved ClassHierarchyIterator::next() into CPP file as you suggested.) ------------- PR: https://git.openjdk.java.net/jdk/pull/2630 From vlivanov at openjdk.java.net Fri Feb 26 08:22:54 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 26 Feb 2021 08:22:54 GMT Subject: Integrated: 8261954: Dependencies: Improve iteration over class hierarchy under context class In-Reply-To: References: Message-ID: <0IXN29kR2y1UNU4HM_mpf6E9GTRuRNWRWHM2WeywQ9A=.f67b9582-79f3-4ffb-9621-5f87a68b00bd@github.com> On Thu, 18 Feb 2021 17:05:08 GMT, Vladimir Ivanov wrote: > Simplify `ClassHierarchyWalker::find_witness_anywhere()` which iterates over class hierarchy under context class searching for witnesses. > > Current implementation traverses the hierarchy in a breadth-first manner and keeps a stack-allocated array to keep a worklist. > But all the subclasses are already part of a singly linked list formed by `Klass::subklass()`/`next_sibling()`/`superklass()`. > > Proposed refactoring gets rid of the explicit worklist and switches to the traversal over the linked list (encapsulated into `ClassHierarchyIterator`). It performs depth-first pre-order hierarchy traversal. > > (There are some other minor refactorings applied in `ClassHierarchyWalker` along the way.) > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] additional verification that CHA decisions aren't affected This pull request has now been integrated. Changeset: 0a4e710f Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/0a4e710f Stats: 173 lines in 3 files changed: 68 ins; 69 del; 36 mod 8261954: Dependencies: Improve iteration over class hierarchy under context class Reviewed-by: kvn, coleenp, eosterlund ------------- PR: https://git.openjdk.java.net/jdk/pull/2630 From thartmann at openjdk.java.net Fri Feb 26 10:49:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 26 Feb 2021 10:49:42 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v9] In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 08:56:14 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > update comments based on the review feedbacks. > move the unittest to test_stringUtil.cpp. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Fri Feb 26 10:49:42 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 26 Feb 2021 10:49:42 GMT Subject: Integrated: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 08:47:13 GMT, Xin Liu wrote: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * This pull request has now been integrated. Changeset: 76032781 Author: Xin Liu Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/76032781 Stats: 55 lines in 2 files changed: 51 ins; 1 del; 3 mod 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From vlivanov at openjdk.java.net Fri Feb 26 13:57:40 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 26 Feb 2021 13:57:40 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v4] In-Reply-To: References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> Message-ID: On Fri, 26 Feb 2021 02:38:00 GMT, Jie Fu wrote: >> Hi all, >> >> Vector API fails to work when: >> - case 1: MaxVectorSize is set to <=8, or >> - case 2: C2 is disabled >> >> The reason is that {max/preferred} VectorShape initialization fails in both cases. >> And the root cause is that VectorSupport_GetMaxLaneCount [1] returns unreasonable values (0 for case 1 and -1 for case 2). >> >> Vector API should not depend on C2 to run. >> It should work even there is no JIT compiler since it's a Java-level api. >> So let's fix it. >> >> Testing: >> - jdk/incubator/vector with -XX:MaxVectorSize=default/8 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Remove -XX:TieredStopAtLevel=3 IMO the fix should be in `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorShape.java`. JVM does the right job when it signals vector support is absent (by returning `-1`). `jdk.incubator.vector` implementation should take that into account and choose a preferred shape for pure Java execution mode. ------------- Changes requested by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2722 From jiefu at openjdk.java.net Fri Feb 26 15:39:42 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 26 Feb 2021 15:39:42 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v4] In-Reply-To: References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> Message-ID: <4nqS6MYId9oNEztpqjrqKivgX7j0D_9hfqSdOXcbKrA=.4ecc4948-8924-4308-b2a3-c98491de3f8e@github.com> On Fri, 26 Feb 2021 13:55:15 GMT, Vladimir Ivanov wrote: > IMO the fix should be in `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorShape.java`. > > JVM does the right job when it signals vector support is absent (by returning `-1`). > > `jdk.incubator.vector` implementation should take that into account and choose a preferred shape for pure Java execution mode. Hi @iwanowww , Thanks for your review. >From the view of C2 compiler, you are right. But the Java programmer may be confused if we got something like DoubleVector.SPECIES_PREFERRED.length() > VectorSupport.getMaxLaneCount(double.class). I'd like to keep DoubleVector.SPECIES_PREFERRED.length() <= VectorSupport.getMaxLaneCount(double.class) for Java programmers since the VectorSupport_GetMaxLaneCount is used to implement a Java API. What do you think? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2722 From vlivanov at openjdk.java.net Fri Feb 26 15:50:41 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 26 Feb 2021 15:50:41 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v4] In-Reply-To: <4nqS6MYId9oNEztpqjrqKivgX7j0D_9hfqSdOXcbKrA=.4ecc4948-8924-4308-b2a3-c98491de3f8e@github.com> References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> <4nqS6MYId9oNEztpqjrqKivgX7j0D_9hfqSdOXcbKrA=.4ecc4948-8924-4308-b2a3-c98491de3f8e@github.com> Message-ID: On Fri, 26 Feb 2021 15:37:08 GMT, Jie Fu wrote: > I'd like to keep DoubleVector.SPECIES_PREFERRED.length() <= VectorSupport.getMaxLaneCount(double.class) for Java programmers since the VectorSupport_GetMaxLaneCount is used to implement a Java API. It doesn't make much sense to me. `VectorSupport` is an internal API for `jdk.incubator.vector` to consume. It's `jdk.incubator.vector` job to interpret the result and adapt accordingly. ------------- PR: https://git.openjdk.java.net/jdk/pull/2722 From vlivanov at openjdk.java.net Fri Feb 26 16:10:41 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 26 Feb 2021 16:10:41 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker [v4] In-Reply-To: References: <5M1DPgw-V-j9acOvHLewPoTaxF-a6-9HMIpIzHjqavU=.55831da6-f888-4dc0-beee-19869c5eb638@github.com> Message-ID: On Thu, 25 Feb 2021 10:06:47 GMT, Andrew Dinn wrote: >> Thanks for addressing the comments! Looks good. > > @JornVernee I'm not clear that your response addresses my point. I'm concerned that a thread stack dump reported by serviceability code may contain an extra frame for the stub call. This could occur while the Java thread is still in native and it could also include the case wher the native call re-enters into Java i.e. the extra frame could appear at the top of the stack dump or interleaved between Java method frames. > > I don't see how that problem is mitigated by your suggestion that this only relates to Panama API use. Code which consumes any such stack dump (incluing 3rd party code) that might be affected by the presence of this extra frame will not care (or even be aware) that the native callout is a Panama call. > > Anyway, since no one from the serviceability team has noted this as a potential problem I'm ok to see the patch proceed. Overall, the fix looks good. At some point, there was no frame for native invoker set up and native state transitions were put inline in generated code, but that was rewritten. Regarding the refactorings: I find newly introduced `spill_register()`/`fill_register()` methods very confusing. I'd prefer to see `spill_output_registers()`/`fill_output_registers()` instead and an assert in `NativeInvokerGenerator` constructor (akin to the one in `NativeInvokerGenerator::generate()` on x86_64): assert(_output_registers.length() <= 1 || (_output_registers.length() == 2 && !_output_registers.at(1)->is_valid()), "no multi-reg returns"); ------------- PR: https://git.openjdk.java.net/jdk/pull/2528 From vlivanov at openjdk.java.net Fri Feb 26 16:10:40 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 26 Feb 2021 16:10:40 GMT Subject: RFR: 8259937: guarantee(loc != NULL) failed: missing saved register with native invoker [v4] In-Reply-To: <5M1DPgw-V-j9acOvHLewPoTaxF-a6-9HMIpIzHjqavU=.55831da6-f888-4dc0-beee-19869c5eb638@github.com> References: <5M1DPgw-V-j9acOvHLewPoTaxF-a6-9HMIpIzHjqavU=.55831da6-f888-4dc0-beee-19869c5eb638@github.com> Message-ID: On Wed, 24 Feb 2021 16:12:59 GMT, Roland Westrelin wrote: >> We spotted this issue with Shenandoah and I managed to write a simple >> test case that reproduces it reliably with Shenandoah but the issue is >> independent of the GC. >> >> The loop in the test case calls a native invoker with an oop live in >> rbp. rbp is saved in the native invoker stub's frame. A safepoint is >> triggered from the safepoint check in the native invoker. The stack >> walking code sees that rbp contains an oop but can't find where that >> oop is stored. That's because stack walking updates the caller's frame >> with the location of rbp in the callee on calls to >> frame::sender(). But the current code sets the last java frame to be >> the compiled frame where rbp is live. So there's no call to >> frame::sender() to update the location rbp. The fix I propose is that >> the frame of the native invoker be visible by stack walking. On a >> safepoint, stack walking starts from the native invoker thread, then >> calls frame::sender() to move to the compiled frame. That causes rbp >> to be properly recorded with its location in the native invoker frame. >> >> Same problem affects both x86 and aarch64. I've tested this patch with: >> >> make run-test TEST="java/foreign" TEST_VM_OPTS="-Xcomp" JTREG="TIMEOUT_FACTOR=10" >> >> on both platforms. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Revert "test & debug" > > This reverts commit cb9dd24c9fcccc6997e9fca874e2860f966b9576. Marked as reviewed by vlivanov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2528 From jiefu at openjdk.java.net Sat Feb 27 03:23:12 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 27 Feb 2021 03:23:12 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v5] In-Reply-To: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> Message-ID: <1Hd5G9-WTDa6JJoWajyvMoyAES1Kea2phVh6umYbov8=.bbf1a2f5-bb7b-40e2-ab66-5b6321b63a0d@github.com> > Hi all, > > Vector API fails to work when: > - case 1: MaxVectorSize is set to <=8, or > - case 2: C2 is disabled > > The reason is that {max/preferred} VectorShape initialization fails in both cases. > And the root cause is that VectorSupport_GetMaxLaneCount [1] returns unreasonable values (0 for case 1 and -1 for case 2). > > Vector API should not depend on C2 to run. > It should work even there is no JIT compiler since it's a Java-level api. > So let's fix it. > > Testing: > - jdk/incubator/vector with -XX:MaxVectorSize=default/8 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 Jie Fu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Fix in jdk/incubator/vector/VectorShape.java - Merge branch 'master' into JDK-8262096 - Revert changes - Remove -XX:TieredStopAtLevel=3 - Update the jtreg test - The numerator should be 8 (byte) - 8262096: Vector API fails to work due to VectorShape initialization exception ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2722/files - new: https://git.openjdk.java.net/jdk/pull/2722/files/bbe6150c..b67b232d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2722&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2722&range=03-04 Stats: 7047 lines in 380 files changed: 3707 ins; 1675 del; 1665 mod Patch: https://git.openjdk.java.net/jdk/pull/2722.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2722/head:pull/2722 PR: https://git.openjdk.java.net/jdk/pull/2722 From jiefu at openjdk.java.net Sat Feb 27 03:26:41 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 27 Feb 2021 03:26:41 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v4] In-Reply-To: References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> <4nqS6MYId9oNEztpqjrqKivgX7j0D_9hfqSdOXcbKrA=.4ecc4948-8924-4308-b2a3-c98491de3f8e@github.com> Message-ID: On Fri, 26 Feb 2021 15:48:18 GMT, Vladimir Ivanov wrote: > > I'd like to keep DoubleVector.SPECIES_PREFERRED.length() <= VectorSupport.getMaxLaneCount(double.class) for Java programmers since the VectorSupport_GetMaxLaneCount is used to implement a Java API. > > It doesn't make much sense to me. `VectorSupport` is an internal API for `jdk.incubator.vector` to consume. > It's `jdk.incubator.vector` job to interpret the result and adapt accordingly. Okay, I'm fine to fix it in jdk/incubator/vector/VectorShape.java if we don't keep something like that. For the updated fix, the {max/preferred} shape will be initialized as shape-64-bit if hotspot doesn't support vectorization. Testing: - jdk/incubator/vector with MaxVectorSize=default/8/4 on Linux/x64 - jdk/incubator/vector without C2 on Linux/x64 Any comments? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2722 From whuang at openjdk.java.net Sat Feb 27 07:00:45 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 27 Feb 2021 07:00:45 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v2] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Fri, 5 Feb 2021 07:29:00 GMT, Wang Huang wrote: >> JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: >> @Benchmark >> public void testMethod(Blackhole bh) { >> int sum = 0; >> for (int i = 0; i < data.length; i++) { >> Integer ii = Integer.valueOf(data[i]); >> if (i < data.length) { >> sum += ii.intValue(); >> } >> } >> bh.consume(sum); >> } >> Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. >> The uncommon_trap is generated by the optimized "if", because its condition is always true. >> >> We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, >> and deleting the use of box: >> >> There is no additional fail/error(s) of jtreg after this patch. >> >> I adjust my codes and add a new benchmark >> >> public class MyBenchmark { >> >> static int[] data = new int[10000]; >> >> static { >> for(int i = 0; i < data.length; ++i) { >> data[i] = i * 1337 % 7331; >> } >> } >> >> @Benchmark >> public void testMethod(Blackhole bh) { >> int sum = 0; >> for (int i = 0; i < data.length; i++) { >> Integer ii = Integer.valueOf(data[i]); >> black(); >> if (i < 100000) { >> sum += ii.intValue(); >> } >> } >> bh.consume(sum); >> } >> >> public void black(){} >> } >> >> >> aarch64: >> base line? >> Benchmark Mode Samples Score Score error Units >> o.s.MyBenchmark.testMethod avgt 30 88.513 1.111 us/op >> >> opt? >> Benchmark Mode Samples Score Score error Units >> o.s.MyBenchmark.testMethod avgt 30 52.776 0.096 us/op >> >> x86: >> base line? >> Benchmark Mode Samples Score Score error Units >> o.s.MyBenchmark.testMethod avgt 30 81.066 3.156 us/op >> >> opt: >> Benchmark Mode Samples Score Score error Units >> o.s.MyBenchmark.testMethod avgt 30 55.596 0.775 us/op > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > refactor codes > Please, add test case which verifies that Box is scalarized by forking process and checking output of run with `-XX:+PrintEliminateAllocations` flag. > You also need a test which triggers deoptimization and execute code for Box object reallocation/initialization or load from cache. > A test should also verifies that box object identity matches after deoptimization in case when object is loaded from cache. OK. I will add these test cases. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From whuang at openjdk.java.net Sat Feb 27 07:00:46 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 27 Feb 2021 07:00:46 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v2] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Tue, 9 Feb 2021 00:56:42 GMT, Xin Liu wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor codes > > src/hotspot/share/opto/callGenerator.cpp line 586: > >> 584: ciInstanceKlass* klass = call->as_CallStaticJava()->method()->holder(); >> 585: int n_fields = klass->nof_nonstatic_fields(); >> 586: assert(n_fields == 1, "sanity"); > > I think you also need to check the only non-static field of klass must be a scalar. > "sanity" is too concise. I think we should leave a message to say it's an auto-boxing class. Yes. I will revise that. Thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From whuang at openjdk.java.net Sat Feb 27 07:00:46 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 27 Feb 2021 07:00:46 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v2] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: On Mon, 8 Feb 2021 18:27:33 GMT, Vladimir Kozlov wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor codes > > src/hotspot/share/opto/callGenerator.cpp line 591: > >> 589: Node* sobj = new SafePointScalarObjectNode(gvn.type(res)->isa_oopptr(), >> 590: #ifdef ASSERT >> 591: NULL, > > I would suggest to record `call` node here treating it as allocation. The prototype of the constuctor is `SafePointScalarObjectNode(const TypeOopPtr* , AllocateNode* , uint, uint)`. However , `call` here is a `CallStaticJavaNode *` instead of `AllocateNode *`. Do you mean that we just pass the `call` as a record here? ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From whuang at openjdk.java.net Sat Feb 27 07:00:49 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 27 Feb 2021 07:00:49 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v5] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> <39-V5pqzlgXaCnL3KTIBXvHwrt35rn4WuuknDv8dcuU=.178fe7a3-4b2b-415c-aaa5-ad63598daeb3@github.com> Message-ID: On Wed, 24 Feb 2021 07:06:34 GMT, Tobias Hartmann wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> add debuginfo optimization > > src/hotspot/share/opto/callGenerator.cpp line 561: > >> 559: if (resproj != nullptr && call->is_CallStaticJava() && >> 560: call->as_CallStaticJava()->is_boxing_method()) { >> 561: Unique_Node_List debuginfo_node_list; > > Maybe rename this to `safepoints`. Sure. > src/hotspot/share/opto/callGenerator.cpp line 569: > >> 567: for (uint i = 0; i < dbg_start; i++) { >> 568: if (sfpt->in(i) == resproj) { >> 569: return; > > I think this code can be replaced by: > if (!sfpt->is_Call() || !sfpt->as_Call()->has_non_debug_use(n)) { > safepoints.push(sfpt); > } else { > ... > } Yes, I will do that. > src/hotspot/share/opto/callGenerator.cpp line 656: > >> 654: } >> 655: >> 656: replace_box_to_scalar(call, callprojs.resproj); > > Should this be guarded by `C->eliminate_boxing()`? Exactly. I will change that. > src/hotspot/share/opto/callnode.hpp line 503: > >> 501: // It is relative to the last (youngest) jvms->_scloff. >> 502: uint _n_fields; // Number of non-static fields of the scalarized object. >> 503: bool _is_auto_box; // is the scalarized object is auto box. > > Typo in comment. Should be something like `// True if the scalarized object is an auto box` Thank you for your review. I'll add this in my next patch. > src/hotspot/share/opto/callGenerator.cpp line 583: > >> 581: while (debuginfo_node_list.size() > 0) { >> 582: ProjNode* res = resproj->as_Proj(); >> 583: Node* debuginfo_node = debuginfo_node_list.pop(); > > `debuginfo_node` -> `safepoint` OK. > src/hotspot/share/opto/callGenerator.cpp line 596: > >> 594: first_ind, n_fields, true); >> 595: sobj->init_req(0, kit.root()); >> 596: debuginfo_node->add_req(call->in(res->_con)); > > I don't understand why you are selecting the input based on the result projection field `res->_con`? Thank you for your review. I use `sfpt->add_req(call->in(TypeFunc::Parms));` instead of this. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From whuang at openjdk.java.net Sat Feb 27 07:10:42 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 27 Feb 2021 07:10:42 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v5] In-Reply-To: References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> <39-V5pqzlgXaCnL3KTIBXvHwrt35rn4WuuknDv8dcuU=.178fe7a3-4b2b-415c-aaa5-ad63598daeb3@github.com> Message-ID: <2_sE6jBIKfHc2b-kJI0yyXMNJRyDSMLq8uVsvDCKgfw=.53813b84-000e-4601-8465-b4e894c03670@github.com> On Wed, 24 Feb 2021 07:14:08 GMT, Tobias Hartmann wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> add debuginfo optimization > > src/hotspot/share/opto/callGenerator.cpp line 587: > >> 585: ciInstanceKlass* klass = call->as_CallStaticJava()->method()->holder(); >> 586: int n_fields = klass->nof_nonstatic_fields(); >> 587: assert(n_fields == 1, "the klass must be an auto-boxing klass"); > > This code can be put in `ifdef ASSERT` and `n_fields` below can be replaced by 1. These codes are similar to https://github.com/openjdk/jdk/pull/853#discussion_r522967411 , so I put `assert` here. ------------- PR: https://git.openjdk.java.net/jdk/pull/2401 From whuang at openjdk.java.net Sat Feb 27 07:17:22 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 27 Feb 2021 07:17:22 GMT Subject: RFR: 8261137: Optimization of Box nodes in uncommon_trap [v6] In-Reply-To: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> References: <8Riu9VCQLM7_vDp5DOMtLZK3yMLQzAkwlIKo4ab0F7Q=.662dbffe-c320-47ea-bc67-508e2c382b12@github.com> Message-ID: > JDK-8075052 has removed useless autobox. However, in some cases, the box is still saved. For instance: > @Benchmark > public void testMethod(Blackhole bh) { > int sum = 0; > for (int i = 0; i < data.length; i++) { > Integer ii = Integer.valueOf(data[i]); > if (i < data.length) { > sum += ii.intValue(); > } > } > bh.consume(sum); > } > Although the variable ii is only used at ii.intValue(), it cannot be eliminated as a result of being used by a hidden uncommon_trap. > The uncommon_trap is generated by the optimized "if", because its condition is always true. > > We can postpone box in uncommon_trap in this situation. We treat box as a scalarized object by adding a SafePointScalarObjectNode in the uncommon_trap node, > and deleting the use of box: > > There is no additional fail/error(s) of jtreg after this patch. > > I adjust my codes and add a new benchmark > > public class MyBenchmark { > > static int[] data = new int[10000]; > > static { > for(int i = 0; i < data.length; ++i) { > data[i] = i * 1337 % 7331; > } > } > > @Benchmark > public void testMethod(Blackhole bh) { > int sum = 0; > for (int i = 0; i < data.length; i++) { > Integer ii = Integer.valueOf(data[i]); > black(); > if (i < 100000) { > sum += ii.intValue(); > } > } > bh.consume(sum); > } > > public void black(){} > } > > > aarch64: > base line? > Benchmark Mode Samples Score Score error Units > o.s.MyBenchmark.testMethod avgt 30 88.513 1.111 us/op > > opt? > Benchmark Mode Samples Score Score error Units > o.s.MyBenchmark.testMethod avgt 30 52.776 0.096 us/op > > x86: > base line? > Benchmark Mode Samples Score Score error Units > o.s.MyBenchmark.testMethod avgt 30 81.066 3.156 us/op > > opt: > Benchmark Mode Samples Score Score error Units > o.s.MyBenchmark.testMethod avgt 30 55.596 0.775 us/op Wang Huang has updated the pull request incrementally with one additional commit since the last revision: fix bugs ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2401/files - new: https://git.openjdk.java.net/jdk/pull/2401/files/84290aeb..20324932 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2401&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2401&range=04-05 Stats: 229 lines in 4 files changed: 210 ins; 6 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/2401.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2401/head:pull/2401 PR: https://git.openjdk.java.net/jdk/pull/2401 From vlivanov at openjdk.java.net Sat Feb 27 11:02:54 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Sat, 27 Feb 2021 11:02:54 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v5] In-Reply-To: <1Hd5G9-WTDa6JJoWajyvMoyAES1Kea2phVh6umYbov8=.bbf1a2f5-bb7b-40e2-ab66-5b6321b63a0d@github.com> References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> <1Hd5G9-WTDa6JJoWajyvMoyAES1Kea2phVh6umYbov8=.bbf1a2f5-bb7b-40e2-ab66-5b6321b63a0d@github.com> Message-ID: On Sat, 27 Feb 2021 03:23:12 GMT, Jie Fu wrote: >> Hi all, >> >> Vector API fails to work when: >> - case 1: MaxVectorSize is set to <=8, or >> - case 2: C2 is disabled >> >> The reason is that {max/preferred} VectorShape initialization fails in both cases. >> And the root cause is that VectorSupport_GetMaxLaneCount [1] returns unreasonable values (0 for case 1 and -1 for case 2). >> >> Vector API should not depend on C2 to run. >> It should work even there is no JIT compiler since it's a Java-level api. >> So let's fix it. >> >> Testing: >> - jdk/incubator/vector with -XX:MaxVectorSize=default/8 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 > > Jie Fu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Fix in jdk/incubator/vector/VectorShape.java > - Merge branch 'master' into JDK-8262096 > - Revert changes > - Remove -XX:TieredStopAtLevel=3 > - Update the jtreg test > - The numerator should be 8 (byte) > - 8262096: Vector API fails to work due to VectorShape initialization exception > For the updated fix, the {max/preferred} shape will be initialized as shape-64-bit if hotspot doesn't support vectorization. Sounds reasonable. test/jdk/jdk/incubator/vector/PreferredSpeciesTest.java line 42: > 40: * @modules jdk.incubator.vector java.base/jdk.internal.vm.vector > 41: * @run testng/othervm -XX:MaxVectorSize=8 PreferredSpeciesTest > 42: * @run testng/othervm -XX:MaxVectorSize=4 PreferredSpeciesTest `-XX:MaxVectorSize` is C2-specific. It's better to specify either `-XX:-IgnoreUnrecognizedVMOptions` or `@requires vm.compiler2.enabled`. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2722 From jiefu at openjdk.java.net Sat Feb 27 11:18:03 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 27 Feb 2021 11:18:03 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v5] In-Reply-To: References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> <1Hd5G9-WTDa6JJoWajyvMoyAES1Kea2phVh6umYbov8=.bbf1a2f5-bb7b-40e2-ab66-5b6321b63a0d@github.com> Message-ID: On Sat, 27 Feb 2021 10:58:16 GMT, Vladimir Ivanov wrote: >> Jie Fu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Fix in jdk/incubator/vector/VectorShape.java >> - Merge branch 'master' into JDK-8262096 >> - Revert changes >> - Remove -XX:TieredStopAtLevel=3 >> - Update the jtreg test >> - The numerator should be 8 (byte) >> - 8262096: Vector API fails to work due to VectorShape initialization exception > > test/jdk/jdk/incubator/vector/PreferredSpeciesTest.java line 42: > >> 40: * @modules jdk.incubator.vector java.base/jdk.internal.vm.vector >> 41: * @run testng/othervm -XX:MaxVectorSize=8 PreferredSpeciesTest >> 42: * @run testng/othervm -XX:MaxVectorSize=4 PreferredSpeciesTest > > `-XX:MaxVectorSize` is C2-specific. It's better to specify either `-XX:-IgnoreUnrecognizedVMOptions` or `@requires vm.compiler2.enabled`. `@requires vm.compiler2.enabled` had been added. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2722 From jiefu at openjdk.java.net Sat Feb 27 11:18:00 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 27 Feb 2021 11:18:00 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v6] In-Reply-To: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> Message-ID: > Hi all, > > Vector API fails to work when: > - case 1: MaxVectorSize is set to <=8, or > - case 2: C2 is disabled > > The reason is that {max/preferred} VectorShape initialization fails in both cases. > And the root cause is that VectorSupport_GetMaxLaneCount [1] returns unreasonable values (0 for case 1 and -1 for case 2). > > Vector API should not depend on C2 to run. > It should work even there is no JIT compiler since it's a Java-level api. > So let's fix it. > > Testing: > - jdk/incubator/vector with -XX:MaxVectorSize=default/8 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/vectorSupport.cpp#L364 Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Add requires vm.compiler2.enabled ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2722/files - new: https://git.openjdk.java.net/jdk/pull/2722/files/b67b232d..79402411 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2722&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2722&range=04-05 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2722.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2722/head:pull/2722 PR: https://git.openjdk.java.net/jdk/pull/2722 From jiefu at openjdk.java.net Sun Feb 28 13:34:47 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 28 Feb 2021 13:34:47 GMT Subject: RFR: 8262096: Vector API fails to work due to VectorShape initialization exception [v5] In-Reply-To: References: <1IGTrGJrdBrsekizG1jLeCKLMkHiGxqKEyzLlQJkZa4=.decd6165-9d52-4e8e-973c-8036295cecff@github.com> <1Hd5G9-WTDa6JJoWajyvMoyAES1Kea2phVh6umYbov8=.bbf1a2f5-bb7b-40e2-ab66-5b6321b63a0d@github.com> Message-ID: On Sat, 27 Feb 2021 11:15:06 GMT, Jie Fu wrote: >> test/jdk/jdk/incubator/vector/PreferredSpeciesTest.java line 42: >> >>> 40: * @modules jdk.incubator.vector java.base/jdk.internal.vm.vector >>> 41: * @run testng/othervm -XX:MaxVectorSize=8 PreferredSpeciesTest >>> 42: * @run testng/othervm -XX:MaxVectorSize=4 PreferredSpeciesTest >> >> `-XX:MaxVectorSize` is C2-specific. It's better to specify either `-XX:-IgnoreUnrecognizedVMOptions` or `@requires vm.compiler2.enabled`. > > `@requires vm.compiler2.enabled` had been added. > Thanks. @PaulSandoz , are you also OK with the latest version? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2722 From jbhateja at openjdk.java.net Sun Feb 28 18:40:07 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 28 Feb 2021 18:40:07 GMT Subject: RFR: 8262355: Support for AVX-512 opmask register allocation. Message-ID: <3NqvqAfKOiHvDo7gvwLvi5_U_9Rz8DFBijVVf1wpXWk=.90d51fb9-c6d0-45be-89b7-60851c7a6681@github.com> AVX-512 added 8 new 64 bit opmask registers[1] . These registers allow conditional execution and efficient merging of destination operands. At present cross instruction mask propagation is being done either using a GPR (e.g. vmask_gen patterns in x86.ad) or a vector register (for propagating results of a vector comparison or vector load mask operations). This base patch extends the register allocator to support allocation of opmask registers. This will facilitate mask propagation across instructions and thus enable emitting efficient instruction sequence over X86 targets supporting AVX-512 feature. We intend to build a robust optimization framework[2] based on this patch to emit optimized instruction sequence for masked/predicated vector operation for X86 targets supporting AVX-512. Please review and share your feedback. Summary of changes: 1) AD side changes: New register definitions, register classes, allocation classes, operand definitions and spill code handling for opmask registers. 2) Runtime: Save/restoration for opmask registers in 32 and 64 bit JVM. a) For 64 bit JVM we were anyways reserving the space in the frame layout but earlier were not saving and restoring at designated offset(1088), hence no extra space overhead apart from save/restore cost. b) For 32 bit JVM: Additional 64 byte are allocated apart from FXSTORE area on the lines of storage for ZMM(16-31) and YMM-Hi bank. There are few regressions due to extra space allocation which we are investigating. 3) Replacing all the hard-coded opmask references from macro-assembly routines: Pulling out the opmask occurrences all the way up to instruction pattern and adding an unbounded opmask operand for them. This exposes these operands to RA and scheduler; this will automatically facilitate spilling of live opmask registers across call sites. 4) Register class initializations related to Op_RegVMask during matcher startup. 5) Handling for mask generating node: Currently VectorMaskGen node uses a GPR to propagate mask across mask generating DEF instruction to its USER instructions. There are other mask generating nodes like VectorCmpMask, VectorLoadMask which are not handled as the part of this patch. Conditional overriding of two routines, ideal_reg and bottom_type for mask generating IDEAL nodes and modifying the instruction patterns to have new opmask operands enables instruction selector to associate opmask register class with USE/DEF operands for such MachNodes. This will constrain the allocation set for these operands to opmask registers(K1-K7). 6) Special handling for setting a flag in PhiNode during construction in case any of its incoming node is a mask generating node, this flag is then checked to return appropriate ideal_reg and bottom_type corresponding to an opmask registers. [1] : Section 15.1.3 : https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-software-developers-manual-volume-1-basic-architecture.html [2] : http://cr.openjdk.java.net/~jbhateja/avx512_masked_operation_optimization/AVX-512_RA_Opmask_Support_VectorMask_Optimizations.pdf ------------- Commit messages: - 8262355: Support for AVX-512 opmask register allocation. Changes: https://git.openjdk.java.net/jdk/pull/2768/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2768&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262355 Stats: 1040 lines in 33 files changed: 767 ins; 13 del; 260 mod Patch: https://git.openjdk.java.net/jdk/pull/2768.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2768/head:pull/2768 PR: https://git.openjdk.java.net/jdk/pull/2768 From jbhateja at openjdk.java.net Sun Feb 28 19:09:00 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 28 Feb 2021 19:09:00 GMT Subject: RFR: 8262355: Support for AVX-512 opmask register allocation. [v2] In-Reply-To: <3NqvqAfKOiHvDo7gvwLvi5_U_9Rz8DFBijVVf1wpXWk=.90d51fb9-c6d0-45be-89b7-60851c7a6681@github.com> References: <3NqvqAfKOiHvDo7gvwLvi5_U_9Rz8DFBijVVf1wpXWk=.90d51fb9-c6d0-45be-89b7-60851c7a6681@github.com> Message-ID: > AVX-512 added 8 new 64 bit opmask registers[1] . These registers allow conditional execution and efficient merging of destination operands. At present cross instruction mask propagation is being done either using a GPR (e.g. vmask_gen patterns in x86.ad) or a vector register (for propagating results of a vector comparison or vector load mask operations). > > This base patch extends the register allocator to support allocation of opmask registers. This will facilitate mask propagation across instructions and thus enable emitting efficient instruction sequence over X86 targets supporting AVX-512 feature. > > We intend to build a robust optimization framework[2] based on this patch to emit optimized instruction sequence for masked/predicated vector operation for X86 targets supporting AVX-512. > > Please review and share your feedback. > > Summary of changes: > > 1) AD side changes: New register definitions, register classes, allocation classes, operand definitions and spill code handling for opmask registers. > > 2) Runtime: Save/restoration for opmask registers in 32 and 64 bit JVM. > a) For 64 bit JVM we were anyways reserving the space in the frame layout but earlier were not saving and restoring at designated offset(1088), hence no extra space overhead apart from save/restore cost. > b) For 32 bit JVM: Additional 64 byte are allocated apart from FXSTORE area on the lines of storage for ZMM(16-31) and YMM-Hi bank. There are few regressions due to extra space allocation which we are investigating. > > 3) Replacing all the hard-coded opmask references from macro-assembly routines: Pulling out the opmask occurrences all the way up to instruction pattern and adding an unbounded opmask operand for them. This exposes these operands to RA and scheduler; this will automatically facilitate spilling of live opmask registers across call sites. > > 4) Register class initializations related to Op_RegVMask during matcher startup. > > 5) Handling for mask generating node: Currently VectorMaskGen node uses a GPR to propagate mask across mask generating DEF instruction to its USER instructions. There are other mask generating nodes like VectorCmpMask, VectorLoadMask which are not handled as the part of this patch. Conditional overriding of two routines, ideal_reg and bottom_type for mask generating IDEAL nodes and modifying the instruction patterns to have new opmask operands enables instruction selector to associate opmask register class with USE/DEF operands for such MachNodes. This will constrain the allocation set for these operands to opmask registers(K1-K7). > > 6) Special handling for setting a flag in PhiNode during construction in case any of its incoming node is a mask generating node, this flag is then checked to return appropriate ideal_reg and bottom_type corresponding to an opmask registers. > > [1] : Section 15.1.3 : https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-software-developers-manual-volume-1-basic-architecture.html > [2] : http://cr.openjdk.java.net/~jbhateja/avx512_masked_operation_optimization/AVX-512_RA_Opmask_Support_VectorMask_Optimizations.pdf Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8262355: Fix for AARCH64 build failure. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2768/files - new: https://git.openjdk.java.net/jdk/pull/2768/files/9e1c3e0d..69003aa6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2768&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2768&range=00-01 Stats: 11 lines in 5 files changed: 1 ins; 2 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2768.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2768/head:pull/2768 PR: https://git.openjdk.java.net/jdk/pull/2768