From yyang at openjdk.java.net Mon Nov 1 07:40:40 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 1 Nov 2021 07:40:40 GMT Subject: RFR: 8274328: C2: Redundant CFG edges fixup in block ordering [v2] In-Reply-To: References: Message-ID: > I think Trace::fixup_blocks is redundant because PhaseCFG::fixup_flow will nevertheless fix up the CFG flow(i.e. flip successor blocks of IfNode) right after PhaseBlockLayout pass, we can remove this step when doing PhaseBlockLayout pass.(Testing: jtreg/compiler/c2, presubmit test) > > https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/compile.cpp#L2765 > > https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/block.cpp#L1679 > > https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/block.cpp#L908-L916 Yi Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - use swap(ref,ref) - Merge branch 'master' into blockordering - 8274328: C2: Redundant CFG edges fixup in block ordering ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5705/files - new: https://git.openjdk.java.net/jdk/pull/5705/files/46381c61..7d58a18f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5705&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5705&range=00-01 Stats: 51547 lines in 1497 files changed: 35427 ins; 10271 del; 5849 mod Patch: https://git.openjdk.java.net/jdk/pull/5705.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5705/head:pull/5705 PR: https://git.openjdk.java.net/jdk/pull/5705 From yyang at openjdk.java.net Mon Nov 1 07:40:48 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 1 Nov 2021 07:40:48 GMT Subject: RFR: 8274328: C2: Redundant CFG edges fixup in block ordering [v2] In-Reply-To: References: Message-ID: <1t3XeLHY3l2xukiaWFibWC1MoudA1SmW5Wb4IAKzUw0=.f2aeed7a-9617-4c8d-8e41-440499bbf6e3@github.com> On Thu, 28 Oct 2021 07:38:04 GMT, Tobias Hartmann wrote: >> Yi Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - use swap(ref,ref) >> - Merge branch 'master' into blockordering >> - 8274328: C2: Redundant CFG edges fixup in block ordering > > src/hotspot/share/opto/block.cpp line 916: > >> 914: ProjNode* tmp = proj0; >> 915: proj0 = proj1; >> 916: proj1 = tmp; > > `swap(proj0, proj1)` can be used here. Thanks Tobias for review. I have replaced above code with `swap(pointer ref,pointer ref)` ------------- PR: https://git.openjdk.java.net/jdk/pull/5705 From thartmann at openjdk.java.net Mon Nov 1 07:41:09 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 1 Nov 2021 07:41:09 GMT Subject: RFR: 8275868: ciReplay: Inlining fails with "unloaded signature classes" due to wrong protection domains In-Reply-To: <16836GhLOzOFB_YHIIphRQbJicN3caZ5P4xOUmcUG5g=.5004acaf-acd1-4db3-9fe6-ed0e8eb9968f@github.com> References: <16836GhLOzOFB_YHIIphRQbJicN3caZ5P4xOUmcUG5g=.5004acaf-acd1-4db3-9fe6-ed0e8eb9968f@github.com> Message-ID: On Thu, 28 Oct 2021 15:01:03 GMT, Christian Hagedorn wrote: > Replay compilation can fail to inline a method which was inlined in the normal run due to unresolved classes in the signature of an inlinee. The reason is that ciReplay is not resolving Java API classes with the protection domain of the holder class of the method to be replay compiled. Compiler replay is currently only resolving classes without a protection domain (i.e. an empty handle): > https://github.com/openjdk/jdk/blob/593401fe8b38bbb8d331a862818fe077af157fcb/src/hotspot/share/ci/ciReplay.cpp#L139-L142 > > A more detailed description can be found in the description of [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868). > > This patch fixes that and takes the protection domain of the holder class of the method to be compiled to resolve all other classes used for ciReplay. The unloaded classes check is done in `ciMethod::has_unloaded_classes_in_signature()` and bypasses the whitelist introduced by JDK-8262912. However, this is fine since the inlining decision is enforced by the inlining information in the replay file. > > To test the various scenarios mentioned in the description of JDK-8275868, I've added some support to use `DumpReplay` to not require a crash. I parse the inlining information from the hotspot log file to check that ciReplay applies the same inlining decisions as the normal run. > > Thanks, > Christian Looks good to me too. Nice test! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6159 From thartmann at openjdk.java.net Mon Nov 1 07:49:11 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 1 Nov 2021 07:49:11 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 08:57:00 GMT, Jie Fu wrote: > I'll run this through our performance testing and report back. Performance results look good. Is this change still required after re-enabling post loop vectorization? ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From thartmann at openjdk.java.net Mon Nov 1 08:02:14 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 1 Nov 2021 08:02:14 GMT Subject: RFR: 8274328: C2: Redundant CFG edges fixup in block ordering [v2] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 07:40:40 GMT, Yi Yang wrote: >> I think Trace::fixup_blocks is redundant because PhaseCFG::fixup_flow will nevertheless fix up the CFG flow(i.e. flip successor blocks of IfNode) right after PhaseBlockLayout pass, we can remove this step when doing PhaseBlockLayout pass.(Testing: jtreg/compiler/c2, presubmit test) >> >> https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/compile.cpp#L2765 >> >> https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/block.cpp#L1679 >> >> https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/block.cpp#L908-L916 > > Yi Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - use swap(ref,ref) > - Merge branch 'master' into blockordering > - 8274328: C2: Redundant CFG edges fixup in block ordering Thanks, looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5705 From thartmann at openjdk.java.net Mon Nov 1 08:17:14 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 1 Nov 2021 08:17:14 GMT Subject: RFR: 8275847: Scheduling fails with "too many D-U pinch points" on small method [v2] In-Reply-To: <0VRE1Xz5B5o9M0DjdTd5KBL5YOXPcp8Od5vCpH96j34=.a0c38e4e-5a02-4adc-be8f-22c579f53d47@github.com> References: <5z-HFwTvcqo_tge6dIMU4VZo-0UkInXAIKuh5D-fkxI=.b5f7a31b-717f-47b5-bfcc-90dbb223075e@github.com> <0VRE1Xz5B5o9M0DjdTd5KBL5YOXPcp8Od5vCpH96j34=.a0c38e4e-5a02-4adc-be8f-22c579f53d47@github.com> Message-ID: On Fri, 29 Oct 2021 07:44:47 GMT, Nick Gasson wrote: >> Since around JDK 16 the following method cannot be compiled by C2 on AArch64: >> >> >> public double mergeSync() { return Math.log(Math.sin(value)); } >> >> >> (Reduced from a slightly larger benchmark.) >> >> >> 811 416 ! 3 Test::mergeSync (61 bytes) >> 813 417 ! 4 Test::mergeSync (61 bytes) >> 816 417 ! 4 Test::mergeSync (61 bytes) COMPILE SKIPPED: too many D-U pinch points (retry at different tier) >> 816 418 ! 1 Test::mergeSync (61 bytes) >> >> >> Scheduling::anti_do_def() will create temporary Nodes for each OptoReg killed by the MachProjs from the two runtime leaf calls. After SVE support was added these runtime calls kill more registers, and the number of new Nodes added by anti_do_def exceeds an internal limit (which is based on the LRG map size and roughly proportional to the method size). >> >> X86 has the same problem if OptoScheduling is enabled because of the wide AVX registers. >> >> The fix here is to ignore OptoRegs which correspond to the high slots of wide vectors (i.e. slots above 64 bits). The scheduler doesn't run on methods where C->max_vector_size() > 8, so we know these kills can't affect the scheduling result. >> >> The added test fails on the current JDK with: >> >> >> compiler.lib.ir_framework.shared.TestRunException: Could not compile public double >> compiler.c2.irTests.TestScheduleSmallMethod.testSmallMethodTwoRuntimeCalls(double) at level C2 >> after 10s. Last compilation level: 3 > > Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: > > Remove dead uses of is_concrete Looks good to me but a second review would be good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6131 From chagedorn at openjdk.java.net Mon Nov 1 08:26:22 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 1 Nov 2021 08:26:22 GMT Subject: RFR: 8275868: ciReplay: Inlining fails with "unloaded signature classes" due to wrong protection domains In-Reply-To: <16836GhLOzOFB_YHIIphRQbJicN3caZ5P4xOUmcUG5g=.5004acaf-acd1-4db3-9fe6-ed0e8eb9968f@github.com> References: <16836GhLOzOFB_YHIIphRQbJicN3caZ5P4xOUmcUG5g=.5004acaf-acd1-4db3-9fe6-ed0e8eb9968f@github.com> Message-ID: <3pR33lRVUeOkn2tz0wkGWLPjHUa2_h5eGuUvzwN8FWo=.5203ac14-c1fd-4943-8664-c74f8bd4981f@github.com> On Thu, 28 Oct 2021 15:01:03 GMT, Christian Hagedorn wrote: > Replay compilation can fail to inline a method which was inlined in the normal run due to unresolved classes in the signature of an inlinee. The reason is that ciReplay is not resolving Java API classes with the protection domain of the holder class of the method to be replay compiled. Compiler replay is currently only resolving classes without a protection domain (i.e. an empty handle): > https://github.com/openjdk/jdk/blob/593401fe8b38bbb8d331a862818fe077af157fcb/src/hotspot/share/ci/ciReplay.cpp#L139-L142 > > A more detailed description can be found in the description of [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868). > > This patch fixes that and takes the protection domain of the holder class of the method to be compiled to resolve all other classes used for ciReplay. The unloaded classes check is done in `ciMethod::has_unloaded_classes_in_signature()` and bypasses the whitelist introduced by JDK-8262912. However, this is fine since the inlining decision is enforced by the inlining information in the replay file. > > To test the various scenarios mentioned in the description of JDK-8275868, I've added some support to use `DumpReplay` to not require a crash. I parse the inlining information from the hotspot log file to check that ciReplay applies the same inlining decisions as the normal run. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6159 From chagedorn at openjdk.java.net Mon Nov 1 08:26:25 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 1 Nov 2021 08:26:25 GMT Subject: Integrated: 8275868: ciReplay: Inlining fails with "unloaded signature classes" due to wrong protection domains In-Reply-To: <16836GhLOzOFB_YHIIphRQbJicN3caZ5P4xOUmcUG5g=.5004acaf-acd1-4db3-9fe6-ed0e8eb9968f@github.com> References: <16836GhLOzOFB_YHIIphRQbJicN3caZ5P4xOUmcUG5g=.5004acaf-acd1-4db3-9fe6-ed0e8eb9968f@github.com> Message-ID: On Thu, 28 Oct 2021 15:01:03 GMT, Christian Hagedorn wrote: > Replay compilation can fail to inline a method which was inlined in the normal run due to unresolved classes in the signature of an inlinee. The reason is that ciReplay is not resolving Java API classes with the protection domain of the holder class of the method to be replay compiled. Compiler replay is currently only resolving classes without a protection domain (i.e. an empty handle): > https://github.com/openjdk/jdk/blob/593401fe8b38bbb8d331a862818fe077af157fcb/src/hotspot/share/ci/ciReplay.cpp#L139-L142 > > A more detailed description can be found in the description of [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868). > > This patch fixes that and takes the protection domain of the holder class of the method to be compiled to resolve all other classes used for ciReplay. The unloaded classes check is done in `ciMethod::has_unloaded_classes_in_signature()` and bypasses the whitelist introduced by JDK-8262912. However, this is fine since the inlining decision is enforced by the inlining information in the replay file. > > To test the various scenarios mentioned in the description of JDK-8275868, I've added some support to use `DumpReplay` to not require a crash. I parse the inlining information from the hotspot log file to check that ciReplay applies the same inlining decisions as the normal run. > > Thanks, > Christian This pull request has now been integrated. Changeset: 5bb1992b Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/5bb1992b8408a0d196b1afa308bc00d007458dbd Stats: 438 lines in 5 files changed: 434 ins; 0 del; 4 mod 8275868: ciReplay: Inlining fails with "unloaded signature classes" due to wrong protection domains Reviewed-by: kvn, dlong, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6159 From jiefu at openjdk.java.net Mon Nov 1 08:49:09 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 1 Nov 2021 08:49:09 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance In-Reply-To: References: Message-ID: On Wed, 27 Oct 2021 14:26:49 GMT, Jie Fu wrote: > Hi all, > > I'd like to reset the value of `LoopPercentProfileLimit` (from 30 to the original 10) for x86 to fix performance degradation. > > We had observed that for the same Java App, the performance of x86 is slower than that of aarch64. > But the x86's performance should not be so worse than the aarch64 according to some SPEC benchmark results. > > After some investigation, it seems that the slowness of x86 is caused by the different default settings of `LoopPercentProfileLimit` (30 for x86, but 10 for other platforms). > If we change `LoopPercentProfileLimit` from 30 to 10, x86 would run faster. > > In JDK-8149421, `LoopPercentProfileLimit` [1] was first added and set to be 30 for x86 and 10 for other platforms. > Logically, the default value of `LoopPercentProfileLimit` is 10 for all platforms even before JDK-8149421. > This is because when `LoopPercentProfileLimit=10`, `10.0` [2] equals `100.0 / LoopPercentProfileLimit` [3]. > So if we set `LoopPercentProfileLimit=10`, this unrolling rule [3] would be the same as the original design before JDK-8149421. > > One most important fact is that from the very beginning of OpenJDK source code, the default value of `LoopPercentProfileLimit` (logically) is 10 for all platforms. > So I suggest resetting `LoopPercentProfileLimit` to the original value (10) for x86, just as other platforms. > > I've noted that the review thread mentioned that JDK-8149421 would be beneficial for some SPECjvm2008 benchmarks [4]. > Then I run SPECjvm2008 with `LoopPercentProfileLimit=10` finding that there is no performance drop on x86. > So it won't revert JDK-8149421's opts for SPECjvm2008. > > To show the potential improvement of this change, I've made a jmh test in the patch. > Performance can be improved by 1.25x ~ 2.0x according to this micro benchmark. > > Any comments? > > Thanks. > Best regards, > Jie > > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L908 > [2] https://github.com/openjdk/jdk8u/blob/master/hotspot/src/share/vm/opto/loopTransform.cpp#L673 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L903 > [4] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021205.html > > ratio > > before > > after > > I'll run this through our performance testing and report back. > > Performance results look good. > > Is this change still required after re-enabling post loop vectorization? > > I'll run this through our performance testing and report back. > > Performance results look good. > > Is this change still required after re-enabling post loop vectorization? This is part of loop unrolling rule, so I think it would be better to change it to improve the performance for the current code base. Then all the future opts on x86 can be evaluated based on that improved version. And we may backport the change to other repos like jdk11u. So I would suggest resetting it to 10 on x86. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From chagedorn at openjdk.java.net Mon Nov 1 10:38:16 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 1 Nov 2021 10:38:16 GMT Subject: RFR: 8271056: C2: "assert(no_dead_loop) failed: dead loop detected" due to cmoving identity In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 13:02:11 GMT, Christian Hagedorn wrote: > In the testcase, an unsafe cmoving identity is applied in `PhiNode::Identity()` after parsing which replaces a loop phi in a dead loop creating a dead data loop which triggers the assertion. The problem is that `PhiNode::Identity()` assumes that a cmoving identity is always safe because `PhiNode::Ideal()` handles unsafe cases and only leaves safe cases to `PhiNode::Identity()`: > https://github.com/openjdk/jdk/blob/4c3491bfa5f296b80c56a37cb4fffd6497323ac2/src/hotspot/share/opto/cfgnode.cpp#L2051-L2055 > > However, the fix for [JDK-8268883 ](https://github.com/openjdk/jdk17/commit/6d8fc7249a3a1a2350c462f9c4fe38377856392f)added the following additional condition to wait for the region to be processed: > https://github.com/openjdk/jdk/blob/4c3491bfa5f296b80c56a37cb4fffd6497323ac2/src/hotspot/share/opto/cfgnode.cpp#L2047-L2053 > > This skips the process of an unsafe case in `PhiNode::Ideal()` in the testcase. Afterwards, the unsafe case is replaced unconditionally in `PhiNode::Identity()` resulting in a dead data loop. > > I therefore propose to add the same check added in JDK-8268883 to `PhiNode::Identity()` to prevent that. > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6172 From shade at openjdk.java.net Mon Nov 1 11:00:14 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 11:00:14 GMT Subject: RFR: 8276105: C2: Conv(D|F)2(I|L)Nodes::Ideal should handle rounding correctly In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 19:19:58 GMT, Vladimir Kozlov wrote: >> Happens now in master: >> >> >> $ CONF=linux-x86-server-fastdebug make run-test TEST=compiler/loopopts/superword/CoLocatePack.java TEST_VM_OPTS="-XX:UseAVX=0 -XX:UseSSE=0" >> ... >> >> CompileCommand: compileonly compiler/loopopts/superword/CoLocatePack.test bool compileonly = true >> 191 ConvF2L === _ 714 [[ 193 ]] !jvms: CoLocatePack::test @ bci:30 (line 70) >> # To suppress the following error report, specify this argument >> # after -XX: or in .hotspotrc: SuppressErrorAt=/phaseX.cpp:1128 >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:1128), pid=1717516, tid=1717532 >> # fatal error: modified node was not processed by IGVN.transform_old() >> >> >> After JDK-8266950 (always `strictfp`), the paths in `Conv(D|F)2(I|L)Nodes::Ideal`-s start to be taken more frequently to round float/double inputs when low SSE is enabled. On those paths, we call `set_req` to rewire current node, but we still return `NULL` from `::Ideal`. I believe that is incorrect, as per `node.cpp` explanation: `NULL` indicates no graph change was done, and `this` should be returned when modification happened. So GVN predictably barfs. >> >> Additional testing: >> - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` (now pass) >> - [x] Linux x86_32 `tier1` default (still pass) >> - [x] Linux x86_64 `tier1` default > > Correct. Thank you, @vnkozlov, @navyxliu. I think I need a second Hotspot (R)eviewer for this? ------------- PR: https://git.openjdk.java.net/jdk/pull/6176 From duke at openjdk.java.net Mon Nov 1 11:41:23 2021 From: duke at openjdk.java.net (Tobias Holenstein) Date: Mon, 1 Nov 2021 11:41:23 GMT Subject: RFR: JDK-8276036: The value of full_count in the message of insufficient codecache is wrong Message-ID: The value of full_count (number of times the code heap was full) in the message of insufficient codecache was 0 even though a codecache shortage occurred. This is fixed by simply incrementing the count before the printing. ------------- Commit messages: - JDK-8276036: The value of full_count in the message of insufficient codecache is wrong Changes: https://git.openjdk.java.net/jdk/pull/6185/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6185&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276036 Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6185.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6185/head:pull/6185 PR: https://git.openjdk.java.net/jdk/pull/6185 From shade at openjdk.java.net Mon Nov 1 11:53:07 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 11:53:07 GMT Subject: RFR: 8276157: C2: Compiler stack overflow during escape analysis on Linux x86_32 In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 10:06:26 GMT, Aleksey Shipilev wrote: > See the bug for test details and analysis. I believe we just legitimately run out of stack in `fastdebug` builds. The fix is to increase the default stack size a bit. Linux-S390, Windows-x86/AArch64 seems to do a similar thing. > > I can do a similar change in `globals_bsd_x86.hpp`, but that would be a blind change, as I don't have platforms to verify that change sanity. I would prefer to make a Linux-specific fix at this time. > > Additional testing: > - [x] Failing test now passes on Linux x86_32 > - [x] Linux x86_32 fastdebug `tier1` Thanks! I guess I need a second (R)eviewer for this. > Does EA find non-escaping allocations when the test passed (with bigger stack)? It is really hard to tell for this test, because it runs way too many configurations, and reducing the configurations makes the failure disappear. Digging through `-XX:+PrintEscapeAnalysis` output, I'd say EA does _not_ find more `NoEscape` objects, after the point where old builds just crash. > To actually fix the issue we would need to re-write recursive method `ConnectionGraph::find_inst_mem()` to normal method using `Node_Stack` or other C2's structures without recursion. Please, file RFE. May be also add check that it is not infinite recursion. Filed [JDK-8276219](https://bugs.openjdk.java.net/browse/JDK-8276219). ------------- PR: https://git.openjdk.java.net/jdk/pull/6167 From shade at openjdk.java.net Mon Nov 1 12:34:22 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 12:34:22 GMT Subject: RFR: 8273416: C2: assert(false) failed: bad AD file after JDK-8252372 with UseSSE={0,1} In-Reply-To: References: Message-ID: <78RZFewSRwHKVWEWBHM0pOvo0ATASuVPNkkUuIVIyqg=.36796b6d-e142-4c83-81c0-3017213e8acc@github.com> On Tue, 7 Sep 2021 10:10:08 GMT, Aleksey Shipilev wrote: > See the bug report for reproducer and failure message. I think the newly added `CastDD`/`CastFF` nodes should handle the extended `regDPR`/`regFPR` (which includes FPU "registers") instead of just `regD`/`regF` to avoid this mismatch error when `UseSSE < 2`. > > Unfortunately, we cannot just use `reg*PR` operands in existing match rules, because those operands are defined as `UseSSE < 2`, and using them as operands and `ideal_regs()` would break the matching on `UseSSE >= 2`. Therefore I had to add another pair of matches. > > Additonal testing: > - [x] Linux x86_32 `tier1` `-XX:UseAVX=0 -XX:UseSSE=0` > - [x] Linux x86_32 `tier1` default > - [x] Linux x86_64 `tier1` default Thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/5386 From shade at openjdk.java.net Mon Nov 1 12:34:22 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 12:34:22 GMT Subject: Integrated: 8273416: C2: assert(false) failed: bad AD file after JDK-8252372 with UseSSE={0,1} In-Reply-To: References: Message-ID: <_0So6n6b5GGbWSovAdceKetyN7Zq1g5IWRxf3hTGE7k=.a163342b-5703-4d05-8ada-549e43b6c9e4@github.com> On Tue, 7 Sep 2021 10:10:08 GMT, Aleksey Shipilev wrote: > See the bug report for reproducer and failure message. I think the newly added `CastDD`/`CastFF` nodes should handle the extended `regDPR`/`regFPR` (which includes FPU "registers") instead of just `regD`/`regF` to avoid this mismatch error when `UseSSE < 2`. > > Unfortunately, we cannot just use `reg*PR` operands in existing match rules, because those operands are defined as `UseSSE < 2`, and using them as operands and `ideal_regs()` would break the matching on `UseSSE >= 2`. Therefore I had to add another pair of matches. > > Additonal testing: > - [x] Linux x86_32 `tier1` `-XX:UseAVX=0 -XX:UseSSE=0` > - [x] Linux x86_32 `tier1` default > - [x] Linux x86_64 `tier1` default This pull request has now been integrated. Changeset: 89ade1d7 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/89ade1d7e88ee005c3fd2136d84e298d94f9a67c Stats: 22 lines in 2 files changed: 20 ins; 0 del; 2 mod 8273416: C2: assert(false) failed: bad AD file after JDK-8252372 with UseSSE={0,1} Reviewed-by: kvn, roland ------------- PR: https://git.openjdk.java.net/jdk/pull/5386 From duke at openjdk.java.net Mon Nov 1 13:12:10 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Mon, 1 Nov 2021 13:12:10 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled [v3] In-Reply-To: <1J1d_whrKEmMGGYm9ZfhuiQo7pq677jppXGfvEZrkNA=.821ff6a1-4d51-4069-b256-a1ca917d2030@github.com> References: <1J1d_whrKEmMGGYm9ZfhuiQo7pq677jppXGfvEZrkNA=.821ff6a1-4d51-4069-b256-a1ca917d2030@github.com> Message-ID: <6SJ9ehNwBgGjBTPl_ufG4bZivS_twxOPk136gYwxtLQ=.fa2e96ab-5d90-4447-9b9f-9f4acee6b0fa@github.com> On Fri, 29 Oct 2021 09:41:37 GMT, SUN Guoyun wrote: >> Hi all, >> Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: >> >>

>> One or more @IR rules failed:
>> 
>> Failed IR Rules (1)
>> ------------------
>> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>>     - failOn: Graph contains forbidden nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>>         Matched forbidden node:
>>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>>     - counts: Graph contains wrong number of nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>>         Expected 1 but found 0 nodes.
>> 
>>>>> Check stdout for compilation output of the failed methods
>> 
>> >> This is a patch to fix this problem. Please help review it. >> >> Thanks, >> Sun Guoyun > > SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: > > 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled Trying 20,000 before just makes the test case occasionally successful. The core problem is that the warm-up should be for function `java.lang.invoke.LambdaForm$DMH::invokeStatic` (that is `mh2`, ) but not `testMethodHandleCallWithLoop` or `testMethodHandleCallWithCCP`, so I have a new patch, I think this is a minimal change for this test, meantime modify the iterations of warm-ups to 5000, then the test succeeds every time. If we consider adding new IR test cases later, it might make more sense to modify the ir framework. ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From chagedorn at openjdk.java.net Mon Nov 1 13:32:20 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 1 Nov 2021 13:32:20 GMT Subject: RFR: 8276227: ciReplay: SIGSEGV if classfile for replay compilation is not present after JDK-8275868 Message-ID: The fix for [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868) does not handle the case when the classfile for the method to be replay compiled is not present. This will fail to load the klass. Afterwards, we are trying to access the protection domain of the failed to load klass (i.e. a null pointer) which results in a segmentation fault. The fix is straight forward to only set the new protection domain if the klass was loaded successfully. I additionally changed the code such that we are only trying to set the protection domain when reading the first `instanceKlass` entry. This avoids some potential problems with older replay files where we do not have this additional first entry set by JDK-8275868. Thanks, Christian ------------- Commit messages: - 8276227: ciReplay: SIGSEGV if classfile for replay compilation is not present after JDK-8275868 Changes: https://git.openjdk.java.net/jdk/pull/6189/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6189&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276227 Stats: 9 lines in 1 file changed: 8 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6189.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6189/head:pull/6189 PR: https://git.openjdk.java.net/jdk/pull/6189 From thartmann at openjdk.java.net Mon Nov 1 14:32:08 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 1 Nov 2021 14:32:08 GMT Subject: RFR: 8276105: C2: Conv(D|F)2(I|L)Nodes::Ideal should handle rounding correctly In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 16:38:11 GMT, Aleksey Shipilev wrote: > Happens now in master: > > > $ CONF=linux-x86-server-fastdebug make run-test TEST=compiler/loopopts/superword/CoLocatePack.java TEST_VM_OPTS="-XX:UseAVX=0 -XX:UseSSE=0" > ... > > CompileCommand: compileonly compiler/loopopts/superword/CoLocatePack.test bool compileonly = true > 191 ConvF2L === _ 714 [[ 193 ]] !jvms: CoLocatePack::test @ bci:30 (line 70) > # To suppress the following error report, specify this argument > # after -XX: or in .hotspotrc: SuppressErrorAt=/phaseX.cpp:1128 > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:1128), pid=1717516, tid=1717532 > # fatal error: modified node was not processed by IGVN.transform_old() > > > After JDK-8266950 (always `strictfp`), the paths in `Conv(D|F)2(I|L)Nodes::Ideal`-s start to be taken more frequently to round float/double inputs when low SSE is enabled. On those paths, we call `set_req` to rewire current node, but we still return `NULL` from `::Ideal`. I believe that is incorrect, as per `node.cpp` explanation: `NULL` indicates no graph change was done, and `this` should be returned when modification happened. So GVN predictably barfs. > > Additional testing: > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` (now pass) > - [x] Linux x86_32 `tier1` default (still pass) > - [x] Linux x86_64 `tier1` default Looks good. Is it worth adding the flag combinations to the test? src/hotspot/share/opto/convertnode.cpp line 148: > 146: Node *ConvD2LNode::Ideal(PhaseGVN *phase, bool can_reshape) { > 147: if (in(1)->Opcode() == Op_RoundDouble) { > 148: set_req(1,in(1)->in(1)); Whitespace after `,` is missing. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6176 From chagedorn at openjdk.java.net Mon Nov 1 15:10:46 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 1 Nov 2021 15:10:46 GMT Subject: RFR: 8276044: ciReplay: C1 does not dump a replay file when using DumpReplay as compile command option Message-ID: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> This patch adds support to dump replay files for C1 with the compile command `DumpReplay`. I added a test to verify that a replay file is dumped with C1 (and C2). Thanks, Christian ------------- Commit messages: - 8276044: ciReplay: C1 does not dump a replay file when using DumpReplay as compile command option Changes: https://git.openjdk.java.net/jdk/pull/6190/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6190&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276044 Stats: 108 lines in 3 files changed: 102 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/6190.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6190/head:pull/6190 PR: https://git.openjdk.java.net/jdk/pull/6190 From shade at openjdk.java.net Mon Nov 1 15:43:44 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 15:43:44 GMT Subject: RFR: 8276105: C2: Conv(D|F)2(I|L)Nodes::Ideal should handle rounding correctly [v2] In-Reply-To: References: Message-ID: > Happens now in master: > > > $ CONF=linux-x86-server-fastdebug make run-test TEST=compiler/loopopts/superword/CoLocatePack.java TEST_VM_OPTS="-XX:UseAVX=0 -XX:UseSSE=0" > ... > > CompileCommand: compileonly compiler/loopopts/superword/CoLocatePack.test bool compileonly = true > 191 ConvF2L === _ 714 [[ 193 ]] !jvms: CoLocatePack::test @ bci:30 (line 70) > # To suppress the following error report, specify this argument > # after -XX: or in .hotspotrc: SuppressErrorAt=/phaseX.cpp:1128 > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:1128), pid=1717516, tid=1717532 > # fatal error: modified node was not processed by IGVN.transform_old() > > > After JDK-8266950 (always `strictfp`), the paths in `Conv(D|F)2(I|L)Nodes::Ideal`-s start to be taken more frequently to round float/double inputs when low SSE is enabled. On those paths, we call `set_req` to rewire current node, but we still return `NULL` from `::Ideal`. I believe that is incorrect, as per `node.cpp` explanation: `NULL` indicates no graph change was done, and `this` should be returned when modification happened. So GVN predictably barfs. > > Additional testing: > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` (now pass) > - [x] Linux x86_32 `tier1` default (still pass) > - [x] Linux x86_64 `tier1` default Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Whitespace - Merge branch 'master' into JDK-8276105-c2-igvn - Fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6176/files - new: https://git.openjdk.java.net/jdk/pull/6176/files/fb3c6a16..a357fcb1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6176&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6176&range=00-01 Stats: 14437 lines in 413 files changed: 11215 ins; 1512 del; 1710 mod Patch: https://git.openjdk.java.net/jdk/pull/6176.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6176/head:pull/6176 PR: https://git.openjdk.java.net/jdk/pull/6176 From shade at openjdk.java.net Mon Nov 1 15:43:48 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 15:43:48 GMT Subject: RFR: 8276105: C2: Conv(D|F)2(I|L)Nodes::Ideal should handle rounding correctly [v2] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 14:29:12 GMT, Tobias Hartmann wrote: > Looks good. Is it worth adding the flag combinations to the test? Honestly, the fact that we fail on that SuperWord packing tier1 test appears to be a pure luck. Also, x86_64 would default to UseSSE >= 2 anyway, so such a config would only make sense for x86_32 running in a quite old mode. So, instead of adding the test configuration to the test, I just added some `-XX:UseSSE=0 -XX:UseAVX=0` runs to our CIs. It is unlikely we would regress this particular place again. > src/hotspot/share/opto/convertnode.cpp line 148: > >> 146: Node *ConvD2LNode::Ideal(PhaseGVN *phase, bool can_reshape) { >> 147: if (in(1)->Opcode() == Op_RoundDouble) { >> 148: set_req(1,in(1)->in(1)); > > Whitespace after `,` is missing. Fixed, thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6176 From kvn at openjdk.java.net Mon Nov 1 18:34:14 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 1 Nov 2021 18:34:14 GMT Subject: RFR: 8276227: ciReplay: SIGSEGV if classfile for replay compilation is not present after JDK-8275868 In-Reply-To: References: Message-ID: <7l_fY4kts96MPf2lC53GOJ8iqJskUt0CKnVA75s7IEc=.459997f3-81e2-4611-85e9-baf419a1b81f@github.com> On Mon, 1 Nov 2021 13:25:03 GMT, Christian Hagedorn wrote: > The fix for [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868) does not handle the case when the classfile for the method to be replay compiled is not present. This will fail to load the klass. Afterwards, we are trying to access the protection domain of the failed to load klass (i.e. a null pointer) which results in a segmentation fault. The fix is straight forward to only set the new protection domain if the klass was loaded successfully. I additionally changed the code such that we are only trying to set the protection domain when reading the first `instanceKlass` entry. This avoids some potential problems with older replay files where we do not have this additional first entry set by JDK-8275868. > > Thanks, > Christian Good. Which test failed? ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6189 From kvn at openjdk.java.net Mon Nov 1 18:39:14 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 1 Nov 2021 18:39:14 GMT Subject: RFR: 8276044: ciReplay: C1 does not dump a replay file when using DumpReplay as compile command option In-Reply-To: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> References: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> Message-ID: On Mon, 1 Nov 2021 15:02:36 GMT, Christian Hagedorn wrote: > This patch adds support to dump replay files for C1 with the compile command `DumpReplay`. I added a test to verify that a replay file is dumped with C1 (and C2). > > Thanks, > Christian Nice. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6190 From kvn at openjdk.java.net Mon Nov 1 18:52:14 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 1 Nov 2021 18:52:14 GMT Subject: RFR: 8276157: C2: Compiler stack overflow during escape analysis on Linux x86_32 In-Reply-To: References: Message-ID: <2QnPlzi-4fWMEeT7oaZE1IDqd0YWA_tU0peAvr5rk3c=.ed6f78cb-9c86-4f7d-b8ab-9352a48bab9d@github.com> On Fri, 29 Oct 2021 10:06:26 GMT, Aleksey Shipilev wrote: > See the bug for test details and analysis. I believe we just legitimately run out of stack in `fastdebug` builds. The fix is to increase the default stack size a bit. Linux-S390, Windows-x86/AArch64 seems to do a similar thing. > > I can do a similar change in `globals_bsd_x86.hpp`, but that would be a blind change, as I don't have platforms to verify that change sanity. I would prefer to make a Linux-specific fix at this time. > > Additional testing: > - [x] Failing test now passes on Linux x86_32 > - [x] Linux x86_32 fastdebug `tier1` Thank you for filing RFE. Yes, you need second review. ------------- PR: https://git.openjdk.java.net/jdk/pull/6167 From kvn at openjdk.java.net Mon Nov 1 18:55:12 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 1 Nov 2021 18:55:12 GMT Subject: RFR: JDK-8276036: The value of full_count in the message of insufficient codecache is wrong In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 11:31:39 GMT, Tobias Holenstein wrote: > The value of full_count (number of times the code heap was full) in the message of insufficient codecache was 0 even though a codecache shortage occurred. This is fixed by simply incrementing the count before the printing. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6185 From iveresov at openjdk.java.net Mon Nov 1 19:16:12 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Mon, 1 Nov 2021 19:16:12 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 07:41:03 GMT, Christian Hagedorn wrote: >> Hi all, >> Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: >> >>

>> One or more @IR rules failed:
>> 
>> Failed IR Rules (1)
>> ------------------
>> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>>     - failOn: Graph contains forbidden nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>>         Matched forbidden node:
>>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>>     - counts: Graph contains wrong number of nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>>         Expected 1 but found 0 nodes.
>> 
>>>>> Check stdout for compilation output of the failed methods
>> 
>> >> This is a patch to fix this problem. Please help review it. >> >> Thanks, >> Sun Guoyun > > Though this issue is about excluding C1, I think the IR framework generally does not handle the case if C2 is excluded in the build (i.e. client VM). It only bails out of IR matching if C2 is excluded by command line flags. I will file a bug for it. @chhagedorn, how do you think this PR should proceed? Would you consider fixing the framework to increase the warmup for the C2-only configuration? ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From dlong at openjdk.java.net Mon Nov 1 19:59:13 2021 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 1 Nov 2021 19:59:13 GMT Subject: RFR: JDK-8276036: The value of full_count in the message of insufficient codecache is wrong In-Reply-To: References: Message-ID: <1-g33UlGmr-5-il8Y7DJLZNqdDyWPNg2uS2db6AR10w=.6fc38c7f-ff40-4481-a593-33f11408b041@github.com> On Mon, 1 Nov 2021 11:31:39 GMT, Tobias Holenstein wrote: > The value of full_count (number of times the code heap was full) in the message of insufficient codecache was 0 even though a codecache shortage occurred. This is fixed by simply incrementing the count before the printing. Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6185 From duke at openjdk.java.net Mon Nov 1 21:00:29 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 1 Nov 2021 21:00:29 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics Message-ID: This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. Testing: - `make test TEST="gtest"`: Passed - `make run-test TEST="tier1"`: Passed - `make run-test TEST="tier2"`: Passed - `make run-test TEST=`: serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java ------------- Commit messages: - 8275729: Qualified method names in CodeHeap Analytics Changes: https://git.openjdk.java.net/jdk/pull/6200/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6200&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275729 Stats: 75 lines in 2 files changed: 75 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6200/head:pull/6200 PR: https://git.openjdk.java.net/jdk/pull/6200 From jiefu at openjdk.java.net Mon Nov 1 23:59:10 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 1 Nov 2021 23:59:10 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance In-Reply-To: References: Message-ID: On Wed, 27 Oct 2021 14:26:49 GMT, Jie Fu wrote: > Hi all, > > I'd like to reset the value of `LoopPercentProfileLimit` (from 30 to the original 10) for x86 to fix performance degradation. > > We had observed that for the same Java App, the performance of x86 is slower than that of aarch64. > But the x86's performance should not be so worse than the aarch64 according to some SPEC benchmark results. > > After some investigation, it seems that the slowness of x86 is caused by the different default settings of `LoopPercentProfileLimit` (30 for x86, but 10 for other platforms). > If we change `LoopPercentProfileLimit` from 30 to 10, x86 would run faster. > > In JDK-8149421, `LoopPercentProfileLimit` [1] was first added and set to be 30 for x86 and 10 for other platforms. > Logically, the default value of `LoopPercentProfileLimit` is 10 for all platforms even before JDK-8149421. > This is because when `LoopPercentProfileLimit=10`, `10.0` [2] equals `100.0 / LoopPercentProfileLimit` [3]. > So if we set `LoopPercentProfileLimit=10`, this unrolling rule [3] would be the same as the original design before JDK-8149421. > > One most important fact is that from the very beginning of OpenJDK source code, the default value of `LoopPercentProfileLimit` (logically) is 10 for all platforms. > So I suggest resetting `LoopPercentProfileLimit` to the original value (10) for x86, just as other platforms. > > I've noted that the review thread mentioned that JDK-8149421 would be beneficial for some SPECjvm2008 benchmarks [4]. > Then I run SPECjvm2008 with `LoopPercentProfileLimit=10` finding that there is no performance drop on x86. > So it won't revert JDK-8149421's opts for SPECjvm2008. > > To show the potential improvement of this change, I've made a jmh test in the patch. > Performance can be improved by 1.25x ~ 2.0x according to this micro benchmark. > > Any comments? > > Thanks. > Best regards, > Jie > > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L908 > [2] https://github.com/openjdk/jdk8u/blob/master/hotspot/src/share/vm/opto/loopTransform.cpp#L673 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L903 > [4] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021205.html > > ratio > > before > > after Hi all, I tested our deep learning cluster using jdk17 yesterday. The time of computing stage can be further reduced by 5% ~ 6% with this patch. So it's really worth making this change. And we also plan to backport it if it is accepted in the jdk mainline. If post loop vectorization requires higher LoopPercentProfileLimit on x86, we can still re-tune it in that enhancement just like other platforms. I also think it would be better to improve the base line performance with `LoopPercentProfileLimit=10` when evaluating the performance opts like post loop vectorization on x86. Thanks. Best regards, Jie ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From duke at openjdk.java.net Tue Nov 2 00:16:44 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Tue, 2 Nov 2021 00:16:44 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> > After JDK-8269559 was integrated there are failures in tier1 testing > across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. > > This patch is NOT functional; rather, this tends to verify potential > toolchain issues as the original patch pass testing on other > platforms. > > In this patch, we remove new SVE-related matching rules and register > class introduced in the original patch to minimally affect the > non-SVE part. TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: Add the matching rule in td file, enable control path in the code stub. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6072/files - new: https://git.openjdk.java.net/jdk/pull/6072/files/c173d9c4..c5f61f57 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6072&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6072&range=01-02 Stats: 163 lines in 3 files changed: 159 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6072.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6072/head:pull/6072 PR: https://git.openjdk.java.net/jdk/pull/6072 From duke at openjdk.java.net Tue Nov 2 02:31:15 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Tue, 2 Nov 2021 02:31:15 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2] In-Reply-To: References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Tue, 19 Oct 2021 20:34:55 GMT, Vamsi Parasa wrote: >> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. > > Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > refactoring to remove code duplication by using a common routine for UMulHiLNode and MulHiLNode Thank you for spotting the stale comment. It will removed in another related commit that will be pushed soon... ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From duke at openjdk.java.net Tue Nov 2 02:54:39 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Tue, 2 Nov 2021 02:54:39 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v6] In-Reply-To: References: Message-ID: > `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. > > For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. > image > > > In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. > image > > There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: > > diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp > index 38b40a6..31ff172 100644 > --- a/src/hotspot/share/opto/ifnode.cpp > +++ b/src/hotspot/share/opto/ifnode.cpp > @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { > } > } > > + if (is_LongCountedLoopEnd()) { > + set_req(0, dom->in(0)); > + set_req(1, dom->in(1)); > + dom->set_req(0, pre); > + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); > + Node* proj0 = raw_out(0); > + Node* proj1 = raw_out(1); > + Node* dom_proj0 = dom->raw_out(0); > + Node* dom_proj1 = dom->raw_out(1); > + dom_proj0->set_req(0, this); > + dom_proj1->set_req(0, this); > + proj0->set_req(0, dom); > + proj1->set_req(0, dom); > + } > + > if (bol->outcnt() == 0) { > igvn->remove_dead_node(bol); // Kill the BoolNode. > } > diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp > index 6f7e34d..7955722 100644 > --- a/src/hotspot/share/opto/loopnode.cpp > +++ b/src/hotspot/share/opto/loopnode.cpp > @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List > Node* back_control = head->in(LoopNode::LoopBackControl); > > // data nodes on back branch not supported > - if (back_control->outcnt() > 1) { > + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { > return false; > } ?? has updated the pull request incrementally with one additional commit since the last revision: Specify vm option needs option 'othervm' ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6099/files - new: https://git.openjdk.java.net/jdk/pull/6099/files/8c81c883..ccfa6f10 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6099&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6099&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6099.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6099/head:pull/6099 PR: https://git.openjdk.java.net/jdk/pull/6099 From yyang at openjdk.java.net Tue Nov 2 05:52:15 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 2 Nov 2021 05:52:15 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 20:51:39 GMT, Evgeny Astigeevich wrote: > This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. > Testing: > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed This looks good now. Old output can not tell us which class the method belongs to. compiler method Addr(module) offset size type lvl temp blobType Name 0x00007f08adc94010 (+0x00000010) 0x00000150( 0K) c1 3 480 nMethod (deopt) nmethod 0x00007f08adc94390 (+0x00000390) 0x000001b0( 0K) c1 3 480 nMethod (active) java.lang.String.isLatin1()Z 0x00007f08adc94710 (+0x00000710) 0x00000258( 0K) c1 3 480 nMethod (active) jdk.internal.util.Preconditions.checkIndex(IILjava/util/function/BiFunction;)I 0x00007f08adc94b90 (+0x00000b90) 0x000004e8( 1K) c1 3 480 nMethod (deopt) nmethod 0x00007f08adc95310 (+0x00001310) 0x00000298( 0K) c1 3 480 nMethod (active) java.lang.StringLatin1.charAt([BI)C 0x00007f08adc95790 (+0x00001790) 0x000001a0( 0K) c1 3 480 nMethod (active) java.lang.String.checkIndex(II)V 0x00007f08adc95b10 (+0x00001b10) 0x00000170( 0K) c1 3 480 nMethod (active) java.lang.String.coder()B 0x00007f08adc95e90 (+0x00001e90) 0x000003e8( 0K) c1 3 480 nMethod (active) java.lang.String.hashCode()I 0x00007f08adc96490 (+0x00002490) 0x00000130( 0K) c1 3 480 nMethod (deopt) nmethod 0x00007f08adc96790 (+0x00002790) 0x00000210( 0K) c1 3 480 nMethod (active) java.lang.String.length()I ------------- Marked as reviewed by yyang (Committer). PR: https://git.openjdk.java.net/jdk/pull/6200 From shade at openjdk.java.net Tue Nov 2 06:42:17 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 2 Nov 2021 06:42:17 GMT Subject: RFR: 8276105: C2: Conv(D|F)2(I|L)Nodes::Ideal should handle rounding correctly [v2] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 15:43:44 GMT, Aleksey Shipilev wrote: >> Happens now in master: >> >> >> $ CONF=linux-x86-server-fastdebug make run-test TEST=compiler/loopopts/superword/CoLocatePack.java TEST_VM_OPTS="-XX:UseAVX=0 -XX:UseSSE=0" >> ... >> >> CompileCommand: compileonly compiler/loopopts/superword/CoLocatePack.test bool compileonly = true >> 191 ConvF2L === _ 714 [[ 193 ]] !jvms: CoLocatePack::test @ bci:30 (line 70) >> # To suppress the following error report, specify this argument >> # after -XX: or in .hotspotrc: SuppressErrorAt=/phaseX.cpp:1128 >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:1128), pid=1717516, tid=1717532 >> # fatal error: modified node was not processed by IGVN.transform_old() >> >> >> After JDK-8266950 (always `strictfp`), the paths in `Conv(D|F)2(I|L)Nodes::Ideal`-s start to be taken more frequently to round float/double inputs when low SSE is enabled. On those paths, we call `set_req` to rewire current node, but we still return `NULL` from `::Ideal`. I believe that is incorrect, as per `node.cpp` explanation: `NULL` indicates no graph change was done, and `this` should be returned when modification happened. So GVN predictably barfs. >> >> Additional testing: >> - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` (now pass) >> - [x] Linux x86_32 `tier1` default (still pass) >> - [x] Linux x86_64 `tier1` default > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Whitespace > - Merge branch 'master' into JDK-8276105-c2-igvn > - Fix All right, if anyone has any other comments, we can do those in follow-ups. ------------- PR: https://git.openjdk.java.net/jdk/pull/6176 From shade at openjdk.java.net Tue Nov 2 06:42:17 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 2 Nov 2021 06:42:17 GMT Subject: Integrated: 8276105: C2: Conv(D|F)2(I|L)Nodes::Ideal should handle rounding correctly In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 16:38:11 GMT, Aleksey Shipilev wrote: > Happens now in master: > > > $ CONF=linux-x86-server-fastdebug make run-test TEST=compiler/loopopts/superword/CoLocatePack.java TEST_VM_OPTS="-XX:UseAVX=0 -XX:UseSSE=0" > ... > > CompileCommand: compileonly compiler/loopopts/superword/CoLocatePack.test bool compileonly = true > 191 ConvF2L === _ 714 [[ 193 ]] !jvms: CoLocatePack::test @ bci:30 (line 70) > # To suppress the following error report, specify this argument > # after -XX: or in .hotspotrc: SuppressErrorAt=/phaseX.cpp:1128 > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:1128), pid=1717516, tid=1717532 > # fatal error: modified node was not processed by IGVN.transform_old() > > > After JDK-8266950 (always `strictfp`), the paths in `Conv(D|F)2(I|L)Nodes::Ideal`-s start to be taken more frequently to round float/double inputs when low SSE is enabled. On those paths, we call `set_req` to rewire current node, but we still return `NULL` from `::Ideal`. I believe that is incorrect, as per `node.cpp` explanation: `NULL` indicates no graph change was done, and `this` should be returned when modification happened. So GVN predictably barfs. > > Additional testing: > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` (now pass) > - [x] Linux x86_32 `tier1` default (still pass) > - [x] Linux x86_64 `tier1` default This pull request has now been integrated. Changeset: 0488ebdf Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/0488ebdf14dacfa79d98de16ed949c813dd88701 Stats: 16 lines in 1 file changed: 8 ins; 0 del; 8 mod 8276105: C2: Conv(D|F)2(I|L)Nodes::Ideal should handle rounding correctly Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6176 From thartmann at openjdk.java.net Tue Nov 2 07:05:14 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 2 Nov 2021 07:05:14 GMT Subject: RFR: JDK-8276036: The value of full_count in the message of insufficient codecache is wrong In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 11:31:39 GMT, Tobias Holenstein wrote: > The value of full_count (number of times the code heap was full) in the message of insufficient codecache was 0 even though a codecache shortage occurred. This is fixed by simply incrementing the count before the printing. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6185 From duke at openjdk.java.net Tue Nov 2 09:54:17 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Tue, 2 Nov 2021 09:54:17 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v2] In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: On Thu, 28 Oct 2021 14:45:05 GMT, Tobias Hartmann wrote: >> TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: >> >> Add the register class and description for this SVE intrinsic. > > All tests passed. @TobiHartmann, The latest patch is the final version. Could you re-run testing again? Many thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From chagedorn at openjdk.java.net Tue Nov 2 10:32:46 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 2 Nov 2021 10:32:46 GMT Subject: RFR: 8276227: ciReplay: SIGSEGV if classfile for replay compilation is not present after JDK-8275868 [v2] In-Reply-To: References: Message-ID: > The fix for [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868) does not handle the case when the classfile for the method to be replay compiled is not present. This will fail to load the klass. Afterwards, we are trying to access the protection domain of the failed to load klass (i.e. a null pointer) which results in a segmentation fault. The fix is straight forward to only set the new protection domain if the klass was loaded successfully. I additionally changed the code such that we are only trying to set the protection domain when reading the first `instanceKlass` entry. This avoids some potential problems with older replay files where we do not have this additional first entry set by JDK-8275868. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: add test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6189/files - new: https://git.openjdk.java.net/jdk/pull/6189/files/4fe95743..1e608b85 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6189&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6189&range=00-01 Stats: 73 lines in 1 file changed: 73 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6189.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6189/head:pull/6189 PR: https://git.openjdk.java.net/jdk/pull/6189 From chagedorn at openjdk.java.net Tue Nov 2 10:32:48 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 2 Nov 2021 10:32:48 GMT Subject: RFR: 8276227: ciReplay: SIGSEGV if classfile for replay compilation is not present after JDK-8275868 In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 13:25:03 GMT, Christian Hagedorn wrote: > The fix for [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868) does not handle the case when the classfile for the method to be replay compiled is not present. This will fail to load the klass. Afterwards, we are trying to access the protection domain of the failed to load klass (i.e. a null pointer) which results in a segmentation fault. The fix is straight forward to only set the new protection domain if the klass was loaded successfully. I additionally changed the code such that we are only trying to set the protection domain when reading the first `instanceKlass` entry. This avoids some potential problems with older replay files where we do not have this additional first entry set by JDK-8275868. > > Thanks, > Christian Thanks Vladimir for your review! Unfortunately, none of our tests caught this. I observed this when I did some additional testing and had not specified the correct class paths. I've added a test which covers this. ------------- PR: https://git.openjdk.java.net/jdk/pull/6189 From duke at openjdk.java.net Tue Nov 2 10:35:15 2021 From: duke at openjdk.java.net (Tobias Holenstein) Date: Tue, 2 Nov 2021 10:35:15 GMT Subject: RFR: JDK-8276036: The value of full_count in the message of insufficient codecache is wrong In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 18:52:28 GMT, Vladimir Kozlov wrote: >> The value of full_count (number of times the code heap was full) in the message of insufficient codecache was 0 even though a codecache shortage occurred. This is fixed by simply incrementing the count before the printing. > > Good. @vnkozlov, @dean-long and @TobiHartmann thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/6185 From chagedorn at openjdk.java.net Tue Nov 2 10:39:10 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 2 Nov 2021 10:39:10 GMT Subject: RFR: 8276044: ciReplay: C1 does not dump a replay file when using DumpReplay as compile command option In-Reply-To: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> References: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> Message-ID: On Mon, 1 Nov 2021 15:02:36 GMT, Christian Hagedorn wrote: > This patch adds support to dump replay files for C1 with the compile command `DumpReplay`. I added a test to verify that a replay file is dumped with C1 (and C2). > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6190 From thartmann at openjdk.java.net Tue Nov 2 10:49:10 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 2 Nov 2021 10:49:10 GMT Subject: RFR: 8276044: ciReplay: C1 does not dump a replay file when using DumpReplay as compile command option In-Reply-To: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> References: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> Message-ID: On Mon, 1 Nov 2021 15:02:36 GMT, Christian Hagedorn wrote: > This patch adds support to dump replay files for C1 with the compile command `DumpReplay`. I added a test to verify that a replay file is dumped with C1 (and C2). > > Thanks, > Christian Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6190 From chagedorn at openjdk.java.net Tue Nov 2 10:55:09 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 2 Nov 2021 10:55:09 GMT Subject: RFR: 8276044: ciReplay: C1 does not dump a replay file when using DumpReplay as compile command option In-Reply-To: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> References: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> Message-ID: <_9NefGlNeqzaFg9EodDy3SpPv6NYn7r2-ONSsrvD2w8=.e85c944a-564a-4d7c-964f-4d46fb014c2a@github.com> On Mon, 1 Nov 2021 15:02:36 GMT, Christian Hagedorn wrote: > This patch adds support to dump replay files for C1 with the compile command `DumpReplay`. I added a test to verify that a replay file is dumped with C1 (and C2). > > Thanks, > Christian Thanks Tobias for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6190 From thartmann at openjdk.java.net Tue Nov 2 10:55:14 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 2 Nov 2021 10:55:14 GMT Subject: RFR: 8276227: ciReplay: SIGSEGV if classfile for replay compilation is not present after JDK-8275868 [v2] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 10:32:46 GMT, Christian Hagedorn wrote: >> The fix for [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868) does not handle the case when the classfile for the method to be replay compiled is not present. This will fail to load the klass. Afterwards, we are trying to access the protection domain of the failed to load klass (i.e. a null pointer) which results in a segmentation fault. The fix is straight forward to only set the new protection domain if the klass was loaded successfully. I additionally changed the code such that we are only trying to set the protection domain when reading the first `instanceKlass` entry. This avoids some potential problems with older replay files where we do not have this additional first entry set by JDK-8275868. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > add test Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6189 From thartmann at openjdk.java.net Tue Nov 2 11:01:12 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 2 Nov 2021 11:01:12 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 20:51:39 GMT, Evgeny Astigeevich wrote: > This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. > Testing: > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6200 From mdoerr at openjdk.java.net Tue Nov 2 11:36:11 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 2 Nov 2021 11:36:11 GMT Subject: RFR: JDK-8276175: codestrings.validate_vm gtest still broken on ppc64 after JDK-8276046 In-Reply-To: References: Message-ID: <4iAT4IO_vKYP4f1bbI-jh3BSPK40gc_nCHKHqYZd9KI=.944cb6d9-bb02-4543-889b-fab377289614@github.com> On Fri, 29 Oct 2021 14:17:10 GMT, Thomas Stuefe wrote: > Frustratingly, JDK-8276046 failed to work because the #ifdef PPC was added to the existing group of #ifdefs at the very start of test_codestrings.cpp. > > PPC is not a primary macro however, it gets set via macros.hpp if one of PPC32 or PPC64 is set. Therefore this only works after the inclusion of macro.hpp (Works for ZERO and PRODUCT, since those are primary macros). > > This fix revives my original fix using the DISABLED_... moniker on test functions. That seems to be the default way to disable tests anyway. Yeah, lets get back to the previous solution which was already reviewed and tested. This change looks good and trivial. Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6174 From chagedorn at openjdk.java.net Tue Nov 2 12:28:19 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 2 Nov 2021 12:28:19 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled [v3] In-Reply-To: <1J1d_whrKEmMGGYm9ZfhuiQo7pq677jppXGfvEZrkNA=.821ff6a1-4d51-4069-b256-a1ca917d2030@github.com> References: <1J1d_whrKEmMGGYm9ZfhuiQo7pq677jppXGfvEZrkNA=.821ff6a1-4d51-4069-b256-a1ca917d2030@github.com> Message-ID: <8swcAk7XCw0rkiyKOuZ2zo-gUbk-8aIBMS7lPexpXnA=.310188e3-3e38-4ea3-be75-1c6201f195b2@github.com> On Fri, 29 Oct 2021 09:41:37 GMT, SUN Guoyun wrote: >> Hi all, >> Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: >> >>

>> One or more @IR rules failed:
>> 
>> Failed IR Rules (1)
>> ------------------
>> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>>     - failOn: Graph contains forbidden nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>>         Matched forbidden node:
>>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>>     - counts: Graph contains wrong number of nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>>         Expected 1 but found 0 nodes.
>> 
>>>>> Check stdout for compilation output of the failed methods
>> 
>> >> This is a patch to fix this problem. Please help review it. >> >> Thanks, >> Sun Guoyun > > SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: > > 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled I think for now, going with ` @Warmup(5000)` seems reasonable. Some thoughts about the IR framework: - Default warmup: I agree that with `-XX:-TieredCompilation`, the warmup seems to be too short. I think it would make sense to use another default warmup value for C2-only configs if IR matching would be performed. Setting `@Warmup`, however, should always override it. I'm not sure what a good default value should be. Any thoughts @veresov? - On top of that, our CI currently only runs `-XX:-TieredCompilation` in combination with `CompileThreshold` which is not whitelisted. This means that IR matching is not performed in that case. That's the reason why we have not detected this bug here in our testing. I think the IR framework should be improved to just ignore any `CompileThreshold` flag settings and allow IR matching to be performed with `-XX:-TieredCompilation` in our CI. If others agree, I will file RFEs for both of these. ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From stuefe at openjdk.java.net Tue Nov 2 13:07:14 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 2 Nov 2021 13:07:14 GMT Subject: RFR: JDK-8276175: codestrings.validate_vm gtest still broken on ppc64 after JDK-8276046 In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 14:17:10 GMT, Thomas Stuefe wrote: > Frustratingly, JDK-8276046 failed to work because the #ifdef PPC was added to the existing group of #ifdefs at the very start of test_codestrings.cpp. > > PPC is not a primary macro however, it gets set via macros.hpp if one of PPC32 or PPC64 is set. Therefore this only works after the inclusion of macro.hpp (Works for ZERO and PRODUCT, since those are primary macros). > > This fix revives my original fix using the DISABLED_... moniker on test functions. That seems to be the default way to disable tests anyway. After talking with Martin, I commit this as trivial ------------- PR: https://git.openjdk.java.net/jdk/pull/6174 From stuefe at openjdk.java.net Tue Nov 2 13:07:15 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 2 Nov 2021 13:07:15 GMT Subject: Integrated: JDK-8276175: codestrings.validate_vm gtest still broken on ppc64 after JDK-8276046 In-Reply-To: References: Message-ID: <6i_MOOmRnn11f0JPJtDaaYl5LqIu0oGL47HnF_nWXpk=.9c34f411-135c-4d02-a49f-818df37364ce@github.com> On Fri, 29 Oct 2021 14:17:10 GMT, Thomas Stuefe wrote: > Frustratingly, JDK-8276046 failed to work because the #ifdef PPC was added to the existing group of #ifdefs at the very start of test_codestrings.cpp. > > PPC is not a primary macro however, it gets set via macros.hpp if one of PPC32 or PPC64 is set. Therefore this only works after the inclusion of macro.hpp (Works for ZERO and PRODUCT, since those are primary macros). > > This fix revives my original fix using the DISABLED_... moniker on test functions. That seems to be the default way to disable tests anyway. This pull request has now been integrated. Changeset: b889f2a8 Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/b889f2a88a5e182d2424b741d8fedf2c784442f1 Stats: 8 lines in 1 file changed: 5 ins; 3 del; 0 mod 8276175: codestrings.validate_vm gtest still broken on ppc64 after JDK-8276046 Reviewed-by: mdoerr ------------- PR: https://git.openjdk.java.net/jdk/pull/6174 From iveresov at openjdk.java.net Tue Nov 2 16:10:18 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Tue, 2 Nov 2021 16:10:18 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled [v3] In-Reply-To: <1J1d_whrKEmMGGYm9ZfhuiQo7pq677jppXGfvEZrkNA=.821ff6a1-4d51-4069-b256-a1ca917d2030@github.com> References: <1J1d_whrKEmMGGYm9ZfhuiQo7pq677jppXGfvEZrkNA=.821ff6a1-4d51-4069-b256-a1ca917d2030@github.com> Message-ID: On Fri, 29 Oct 2021 09:41:37 GMT, SUN Guoyun wrote: >> Hi all, >> Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: >> >>

>> One or more @IR rules failed:
>> 
>> Failed IR Rules (1)
>> ------------------
>> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>>     - failOn: Graph contains forbidden nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>>         Matched forbidden node:
>>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>>     - counts: Graph contains wrong number of nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>>         Expected 1 but found 0 nodes.
>> 
>>>>> Check stdout for compilation output of the failed methods
>> 
>> >> This is a patch to fix this problem. Please help review it. >> >> Thanks, >> Sun Guoyun > > SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: > > 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled Marked as reviewed by iveresov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From iveresov at openjdk.java.net Tue Nov 2 16:10:19 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Tue, 2 Nov 2021 16:10:19 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled [v3] In-Reply-To: <8swcAk7XCw0rkiyKOuZ2zo-gUbk-8aIBMS7lPexpXnA=.310188e3-3e38-4ea3-be75-1c6201f195b2@github.com> References: <1J1d_whrKEmMGGYm9ZfhuiQo7pq677jppXGfvEZrkNA=.821ff6a1-4d51-4069-b256-a1ca917d2030@github.com> <8swcAk7XCw0rkiyKOuZ2zo-gUbk-8aIBMS7lPexpXnA=.310188e3-3e38-4ea3-be75-1c6201f195b2@github.com> Message-ID: On Tue, 2 Nov 2021 12:25:38 GMT, Christian Hagedorn wrote: >> SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: >> >> 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled > > I think for now, going with ` @Warmup(5000)` seems reasonable. Some thoughts about the IR framework: > - Default warmup: I agree that with `-XX:-TieredCompilation`, the warmup seems to be too short. I think it would make sense to use another default warmup value for C2-only configs if IR matching would be performed. Setting `@Warmup`, however, should always override it. I'm not sure what a good default value should be. Any thoughts @veresov? > - On top of that, our CI currently only runs `-XX:-TieredCompilation` in combination with `CompileThreshold` which is not whitelisted. This means that IR matching is not performed in that case. That's the reason why we have not detected this bug here in our testing. I think the IR framework should be improved to just ignore any `CompileThreshold` flag settings and allow IR matching to be performed with `-XX:-TieredCompilation` in our CI. > > If others agree, I will file RFEs for both of these. @chhagedorn, ok, let's go with the current solution for now then. As for the default warmup, I would probably expose `CompilationPolicy::min_invocations()` through the WB API and do the warmup based on that. There are a lot of ways different flags may affect the thresholds, I think we just need an authoritative API point to tell us the minimum number of invocations. I would also disable various feedback mechanisms in the policy to make it more deterministic. So, `Tier4LoadFeedback=1000000, Tier3LoadFeedback=1000000, Tier3DelayOn=1000000, Tier0Delay=1000000, TieredCompileTaskTimeout=1000000`; or we should add a single option to disable all adaptive features. ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From simonis at openjdk.java.net Tue Nov 2 16:38:20 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 2 Nov 2021 16:38:20 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 20:51:39 GMT, Evgeny Astigeevich wrote: > This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. > Testing: > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed src/hotspot/share/code/codeHeapState.cpp line 2340: > 2338: > 2339: Klass* klass = method->method_holder(); > 2340: assert(klass->is_loader_alive(), "must be alive"); Are you sure `klass` is always valid here and that its class loader has to be alive (i.e. the corresponding class hasn't been unloaded in the meantime)? In [https://bugs.openjdk.java.net/browse/JDK-8275729](JDK-8275729) you say that the Top50 list already has qualified names but as far as I know, that information is already collected in the aggregation step where it is safe. You now query this information in the reporting step. I know we had problems due to access to dead methods before (see [JDK-8219586: CodeHeap State Analytics processes dead nmethods](https://bugs.openjdk.java.net/browse/JDK-8219586) and I just want to make sure we don't re-introduce such problems. Maybe @RealLucy or @fisk can have an additional look? ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From duke at openjdk.java.net Tue Nov 2 16:38:22 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 2 Nov 2021 16:38:22 GMT Subject: Integrated: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 20:51:39 GMT, Evgeny Astigeevich wrote: > This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. > Testing: > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed This pull request has now been integrated. Changeset: 8fc16f16 Author: Evgeny Astigeevich Committer: Paul Hohensee URL: https://git.openjdk.java.net/jdk/commit/8fc16f1605b396bfb9265a97bc126d435d6d5951 Stats: 75 lines in 2 files changed: 75 ins; 0 del; 0 mod 8275729: Qualified method names in CodeHeap Analytics Reviewed-by: yyang, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From phh at openjdk.java.net Tue Nov 2 16:42:21 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Tue, 2 Nov 2021 16:42:21 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 20:51:39 GMT, Evgeny Astigeevich wrote: > This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. > Testing: > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed Volker, I sponsored this before you posted your review. Evgeny, if it's a problem, please file a bug. ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From simonis at openjdk.java.net Tue Nov 2 16:42:21 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 2 Nov 2021 16:42:21 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 20:51:39 GMT, Evgeny Astigeevich wrote: > This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. > Testing: > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed No problem, I know I was late :) But I also know that this is a sensitive area, so better double check... ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From duke at openjdk.java.net Tue Nov 2 17:08:17 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 2 Nov 2021 17:08:17 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 16:34:34 GMT, Volker Simonis wrote: >> This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. >> Testing: >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed > > src/hotspot/share/code/codeHeapState.cpp line 2340: > >> 2338: >> 2339: Klass* klass = method->method_holder(); >> 2340: assert(klass->is_loader_alive(), "must be alive"); > > Are you sure `klass` is always valid here and that its class loader has to be alive (i.e. the corresponding class hasn't been unloaded in the meantime)? > > In [https://bugs.openjdk.java.net/browse/JDK-8275729](JDK-8275729) you say that the Top50 list already has qualified names but as far as I know, that information is already collected in the aggregation step where it is safe. You now query this information in the reporting step. > > I know we had problems due to access to dead methods before (see [JDK-8219586: CodeHeap State Analytics processes dead nmethods](https://bugs.openjdk.java.net/browse/JDK-8219586) and I just want to make sure we don't re-introduce such problems. > > Maybe @RealLucy or @fisk can have an additional look? @simonis The code is guarded by checks: // access nmethod and Method fields only if we own the CodeCache_lock. // This fact is implicitly transported via nm != NULL. if (nmethod_access_is_safe(nm)) { ... bool get_name = (cbType == nMethod_inuse) || (cbType == nMethod_notused); ... if (get_name) { I was thinking whether I should use `if (klass->is_loader_alive())` or `assert(klass->is_loader_alive())`. I chose the assert because if it is safe to access `Method` than its holder `Klass` must be alive. ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From duke at openjdk.java.net Tue Nov 2 18:14:23 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 2 Nov 2021 18:14:23 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 05:49:30 GMT, Yi Yang wrote: >> This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. >> Testing: >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed > > This looks good now. Old output can not tell us which class the method belongs to. > > > Old: > 0x00007f6e91063010 (+0x00000010) 0x000000a0( 0K) none 0 480 nMethod (deopt) nmethod > 0x00007f6e91063310 (+0x00000310) 0x000000f8( 0K) none 0 480 nMethod (active) name()Ljava/lang/String; > 0x00007f6e91063610 (+0x00000610) 0x000000f8( 0K) none 0 480 nMethod (active) descriptor()Ljava/lang/module/ModuleDescriptor; > 0x00007f6e91063910 (+0x00000910) 0x00000000( 0K) none 0 480 nMethod (active) getReferenceVolatile(Ljava/lang/Object;J)Ljava/lang/Object; > 0x00007f6e91063d90 (+0x00000d90) 0x00000000( 0K) none 0 480 nMethod (active) hashCode()I > 0x00007f6e91064190 (+0x00001190) 0x000000f8( 0K) c1 1 480 nMethod (active) name()Ljava/lang/String; > 0x00007f6e91064490 (+0x00001490) 0x000000f8( 0K) c1 1 480 nMethod (active) modifiers()Ljava/util/Set; > 0x00007f6e91064790 (+0x00001790) 0x000000f8( 0K) c1 1 480 nMethod (active) targets()Ljava/util/Set; > 0x00007f6e91064a90 (+0x00001a90) 0x000000f8( 0K) c1 1 480 nMethod (active) source()Ljava/lang/String; > 0x00007f6e91064d90 (+0x00001d90) 0x000000f8( 0K) c1 1 480 nMethod (active) isEmpty()Z > New: > > 0x00007f08adc94010 (+0x00000010) 0x00000150( 0K) c1 3 480 nMethod (deopt) nmethod > 0x00007f08adc94390 (+0x00000390) 0x000001b0( 0K) c1 3 480 nMethod (active) java.lang.String.isLatin1()Z > 0x00007f08adc94710 (+0x00000710) 0x00000258( 0K) c1 3 480 nMethod (active) jdk.internal.util.Preconditions.checkIndex(IILjava/util/function/BiFunction;)I > 0x00007f08adc94b90 (+0x00000b90) 0x000004e8( 1K) c1 3 480 nMethod (deopt) nmethod > 0x00007f08adc95310 (+0x00001310) 0x00000298( 0K) c1 3 480 nMethod (active) java.lang.StringLatin1.charAt([BI)C > 0x00007f08adc95790 (+0x00001790) 0x000001a0( 0K) c1 3 480 nMethod (active) java.lang.String.checkIndex(II)V > 0x00007f08adc95b10 (+0x00001b10) 0x00000170( 0K) c1 3 480 nMethod (active) java.lang.String.coder()B > 0x00007f08adc95e90 (+0x00001e90) 0x000003e8( 0K) c1 3 480 nMethod (active) java.lang.String.hashCode()I > 0x00007f08adc96490 (+0x00002490) 0x00000130( 0K) c1 3 480 nMethod (deopt) nmethod > 0x00007f08adc96790 (+0x00002790) 0x00000210( 0K) c1 3 480 nMethod (active) java.lang.String.length()I Thanks for reviewing @kelthuzadx and @TobiHartmann. ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From dlong at openjdk.java.net Tue Nov 2 20:38:19 2021 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 2 Nov 2021 20:38:19 GMT Subject: RFR: 8276044: ciReplay: C1 does not dump a replay file when using DumpReplay as compile command option In-Reply-To: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> References: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> Message-ID: On Mon, 1 Nov 2021 15:02:36 GMT, Christian Hagedorn wrote: > This patch adds support to dump replay files for C1 with the compile command `DumpReplay`. I added a test to verify that a replay file is dumped with C1 (and C2). > > Thanks, > Christian Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6190 From dlong at openjdk.java.net Tue Nov 2 20:41:11 2021 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 2 Nov 2021 20:41:11 GMT Subject: RFR: 8276227: ciReplay: SIGSEGV if classfile for replay compilation is not present after JDK-8275868 [v2] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 10:32:46 GMT, Christian Hagedorn wrote: >> The fix for [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868) does not handle the case when the classfile for the method to be replay compiled is not present. This will fail to load the klass. Afterwards, we are trying to access the protection domain of the failed to load klass (i.e. a null pointer) which results in a segmentation fault. The fix is straight forward to only set the new protection domain if the klass was loaded successfully. I additionally changed the code such that we are only trying to set the protection domain when reading the first `instanceKlass` entry. This avoids some potential problems with older replay files where we do not have this additional first entry set by JDK-8275868. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > add test Marked as reviewed by dlong (Reviewer). src/hotspot/share/ci/ciReplay.cpp line 890: > 888: // This also ensures that older replay files work. > 889: _protection_domain_initialized = true; > 890: I don't see how this helps older replay files. In fact, it seems like it could make replay for older replay files fail, if the first entry has a different protection domain than the main class. If we really want to preserve the old behavior of old replay files, then I think we need to add a version number or some other keyword so that we can tell if a replay file is old or not. However, in my opinion supporting old replay files should not be a goal. ------------- PR: https://git.openjdk.java.net/jdk/pull/6189 From lucy at openjdk.java.net Tue Nov 2 20:58:20 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Tue, 2 Nov 2021 20:58:20 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 20:51:39 GMT, Evgeny Astigeevich wrote: > This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. > Testing: > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed To me, the change looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From lucy at openjdk.java.net Tue Nov 2 20:58:21 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Tue, 2 Nov 2021 20:58:21 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 17:03:50 GMT, Evgeny Astigeevich wrote: >> src/hotspot/share/code/codeHeapState.cpp line 2340: >> >>> 2338: >>> 2339: Klass* klass = method->method_holder(); >>> 2340: assert(klass->is_loader_alive(), "must be alive"); >> >> Are you sure `klass` is always valid here and that its class loader has to be alive (i.e. the corresponding class hasn't been unloaded in the meantime)? >> >> In [https://bugs.openjdk.java.net/browse/JDK-8275729](JDK-8275729) you say that the Top50 list already has qualified names but as far as I know, that information is already collected in the aggregation step where it is safe. You now query this information in the reporting step. >> >> I know we had problems due to access to dead methods before (see [JDK-8219586: CodeHeap State Analytics processes dead nmethods](https://bugs.openjdk.java.net/browse/JDK-8219586) and I just want to make sure we don't re-introduce such problems. >> >> Maybe @RealLucy or @fisk can have an additional look? > > @simonis > The code is guarded by checks: > > // access nmethod and Method fields only if we own the CodeCache_lock. > // This fact is implicitly transported via nm != NULL. > if (nmethod_access_is_safe(nm)) { > ... > bool get_name = (cbType == nMethod_inuse) || (cbType == nMethod_notused); > ... > if (get_name) { > > I was thinking whether I should use `if (klass->is_loader_alive())` or `assert(klass->is_loader_alive())`. I chose the assert because if it is safe to access `Method` than its holder `Klass` must be alive. Hi, the code is safe. Not because of the checks cited by @eastig but because print_names() is only called if the required locks (Compile_lock and CodeCache_lock) have been continuously held since the aggregation step. See src/hotspot/share/compiler/compileBroker.cpp. A lot of effort has been spent to be less restrictive on print_names(), with no success. Thanks for the enhancement. ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From dnsimon at openjdk.java.net Tue Nov 2 21:39:32 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 2 Nov 2021 21:39:32 GMT Subject: RFR: 8276314: [JVMCI] check alignment of call displacement during code installation Message-ID: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> This PR add verification of code alignment invariants related to x64 call instructions during code installation. This in turn allows a JVMCI compilation that generates a misaligned call to fail gracefully (i.e. bailout) instead of the VM crashing when it checks alignment before patching the displacement of a call instruction. ------------- Commit messages: - ensure call displacement is aligned during code installation Changes: https://git.openjdk.java.net/jdk/pull/6218/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6218&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276314 Stats: 16 lines in 3 files changed: 9 ins; 1 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/6218.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6218/head:pull/6218 PR: https://git.openjdk.java.net/jdk/pull/6218 From kvn at openjdk.java.net Tue Nov 2 21:56:20 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 2 Nov 2021 21:56:20 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 20:51:39 GMT, Evgeny Astigeevich wrote: > This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. > Testing: > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed The change caused failure in our testing: https://bugs.openjdk.java.net/browse/JDK-8276429 @eastig I will assign it to you ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From kvn at openjdk.java.net Tue Nov 2 22:08:14 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 2 Nov 2021 22:08:14 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 20:51:39 GMT, Evgeny Astigeevich wrote: > This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. > Testing: > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed I don't think we need this assert just to print klass's name ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From kvn at openjdk.java.net Tue Nov 2 22:45:10 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 2 Nov 2021 22:45:10 GMT Subject: RFR: 8276314: [JVMCI] check alignment of call displacement during code installation In-Reply-To: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> References: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> Message-ID: On Tue, 2 Nov 2021 21:31:25 GMT, Doug Simon wrote: > This PR add verification of code alignment invariants related to x64 call instructions during code installation. > This in turn allows a JVMCI compilation that generates a misaligned call to fail gracefully (i.e. bailout) instead of the VM crashing when it checks alignment before patching the displacement of a call instruction. src/hotspot/cpu/x86/jvmciCodeInstaller_x86.cpp line 191: > 189: } > 190: default: > 191: JVMCI_ERROR("invalid _next_call_type value"); May be print `%d` invalid call type here too since you are changing code around. src/hotspot/cpu/x86/jvmciCodeInstaller_x86.cpp line 194: > 192: return; > 193: } > 194: if (os::is_MP() && !call->is_displacement_aligned()) { You are checking for `MP` in current era? Why not always require alignment? ------------- PR: https://git.openjdk.java.net/jdk/pull/6218 From duke at openjdk.java.net Tue Nov 2 22:52:17 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 2 Nov 2021 22:52:17 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 22:05:01 GMT, Vladimir Kozlov wrote: > I don't think we need this assert just to print klass's name. May be follow the code pattern for method's name and signature. Agree. I'll submit PR with the code: Symbol* className = klass->name(); const char* classNameS = (className == nullptr) ? nullptr : className->external_name(); classNameS = (classNameS == nullptr) ? "" : classNameS; ast->print("%s.", classNameS); ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From kvn at openjdk.java.net Tue Nov 2 23:00:17 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 2 Nov 2021 23:00:17 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 20:51:39 GMT, Evgeny Astigeevich wrote: > This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. > Testing: > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed Yes, I am currently testing similar fix: - Klass* klass = method->method_holder(); - assert(klass->is_loader_alive(), "must be alive"); + Klass* methHolder = method->method_holder(); + const char* methHolderS = (methHolder == NULL) ? NULL : methHolder->external_name(); + methHolderS = (methHolderS == NULL) ? "" : methHolderS; - ast->print("%s.", klass->external_name()); + ast->print("%s.", methHolderS); Note, failed test is `closed` so I have to run testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From duke at openjdk.java.net Tue Nov 2 23:08:19 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 2 Nov 2021 23:08:19 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: <0uEXYSu_8Y8V_aiyIRPVm3ZTvLRqeOFsULFvjHrl5n8=.ed87a21f-f34c-4950-b7f0-8ac32916c670@github.com> On Tue, 2 Nov 2021 22:57:23 GMT, Vladimir Kozlov wrote: > Yes, I am currently testing similar fix: > > ``` > - Klass* klass = method->method_holder(); > - assert(klass->is_loader_alive(), "must be alive"); > + Klass* methHolder = method->method_holder(); > + const char* methHolderS = (methHolder == NULL) ? NULL : methHolder->external_name(); > + methHolderS = (methHolderS == NULL) ? "" : methHolderS; > > - ast->print("%s.", klass->external_name()); > + ast->print("%s.", methHolderS); > ``` > > Note, failed test is `closed` so I have to run testing. Is NULL method holder an acceptable situation? Could it be a sign of a bug? BTW, `Klass::external_name()` returns `` if `Klass::name()` is `NULL`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From kvn at openjdk.java.net Tue Nov 2 23:20:14 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 2 Nov 2021 23:20:14 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: <0uEXYSu_8Y8V_aiyIRPVm3ZTvLRqeOFsULFvjHrl5n8=.ed87a21f-f34c-4950-b7f0-8ac32916c670@github.com> References: <0uEXYSu_8Y8V_aiyIRPVm3ZTvLRqeOFsULFvjHrl5n8=.ed87a21f-f34c-4950-b7f0-8ac32916c670@github.com> Message-ID: On Tue, 2 Nov 2021 23:03:22 GMT, Evgeny Astigeevich wrote: > Is NULL method holder an acceptable situation? Could it be a sign of a bug? You are right, all methods should have class holders. I just followed code pattern. > BTW, `Klass::external_name()` returns `` if `Klass::name()` is `NULL`. I see, you want to have the same string instead of ``. Reasonable. I will test your changes too. File PR and I will review and post testing results. ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From kvn at openjdk.java.net Tue Nov 2 23:20:15 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 2 Nov 2021 23:20:15 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: References: Message-ID: <0L9rPQiskq-xc2eMFhRazjcZUFsuQW31_kVBwC47UkA=.6c0b5ec1-8696-47b5-872a-f478866bf0d0@github.com> On Mon, 1 Nov 2021 20:51:39 GMT, Evgeny Astigeevich wrote: > This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. > Testing: > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed BTW, you need to update Copyright year in file. ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From jiefu at openjdk.java.net Tue Nov 2 23:52:08 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 2 Nov 2021 23:52:08 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 07:46:01 GMT, Tobias Hartmann wrote: >>> I'll run this through our performance testing and report back. >> >> Thanks @TobiHartmann . > >> I'll run this through our performance testing and report back. > > Performance results look good. > > Is this change still required after re-enabling post loop vectorization? Hi @TobiHartmann , The `LoopPercentProfileLimit` was changed from 10 to 30 (logically) on x86 to enable post loop vectorization in JDK-8149421. However, post loop vectorization was disabled by JDK-8183103 without restoring the original value of `LoopPercentProfileLimit` to 10 on x86. So do you agree that we'd better restore `LoopPercentProfileLimit=10` if post loop vectorization is disabled? According to the comment by @pfustc in JDK-8183390, JEP 417: Vector API (Third Incubator) [1] is needed to enable post loop vectorization. But I'm not sure whether `LoopPercentProfileLimit=30` is still the best choice for x86. Maybe, it should be re-tuned based on the new implementation too. So it doesn't matter to restore `LoopPercentProfileLimit=10` in the jdk mainline. And it seems hard to re-enable post loop vectorization for jdk versions without Vector API (e.g., jdk11). So for better performance experience on x86, I suggest restoring `LoopPercentProfileLimit=10` if post loop vectorization is hard to be re-enabled, which means we'd better restore `LoopPercentProfileLimit=10` for jdk11 too. What do you think? Thanks. [1] https://github.com/openjdk/jdk/pull/5873 ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From never at openjdk.java.net Tue Nov 2 23:55:10 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Tue, 2 Nov 2021 23:55:10 GMT Subject: RFR: 8276314: [JVMCI] check alignment of call displacement during code installation In-Reply-To: References: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> Message-ID: On Tue, 2 Nov 2021 22:41:47 GMT, Vladimir Kozlov wrote: >> This PR add verification of code alignment invariants related to x64 call instructions during code installation. >> This in turn allows a JVMCI compilation that generates a misaligned call to fail gracefully (i.e. bailout) instead of the VM crashing when it checks alignment before patching the displacement of a call instruction. > > src/hotspot/cpu/x86/jvmciCodeInstaller_x86.cpp line 194: > >> 192: return; >> 193: } >> 194: if (os::is_MP() && !call->is_displacement_aligned()) { > > You are checking for `MP` in current era? Why not always require alignment? I agree. I think C2 has always aligned and C1 used to check is_MP but no longer does. Requiring JVMCI compilers to always align seems right, particularly since we don't expose is `is_MP` through JVMCI. Graal and C1 actually appear to over align the displacement by aligning it to BytesPerWord while C2 always aligns to 4 which is all that is required. It's odd that alignment check in NativeCall isn't simply `displacement_offset() % 4`. The existing check implies that it's ok to use a misaligned offset as long as it starts and ends within an 8 byte region but I don't know that that would really work and none of the compilers actually take advantage of it. That's probably beyond the scope of this PR. Actually `verify_alignment` checks that it's aligned on BytesPerInt so maybe `is_displacement_aligned` should unify around that definition. ------------- PR: https://git.openjdk.java.net/jdk/pull/6218 From dlong at openjdk.java.net Wed Nov 3 00:56:09 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 3 Nov 2021 00:56:09 GMT Subject: RFR: 8276227: ciReplay: SIGSEGV if classfile for replay compilation is not present after JDK-8275868 [v2] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 19:49:22 GMT, Dean Long wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> add test > > src/hotspot/share/ci/ciReplay.cpp line 890: > >> 888: // This also ensures that older replay files work. >> 889: _protection_domain_initialized = true; >> 890: > > I don't see how this helps older replay files. In fact, it seems like it could make replay for older replay files fail, if the first entry has a different protection domain than the main class. > > If we really want to preserve the old behavior of old replay files, then I think we need to add a version number or some other keyword so that we can tell if a replay file is old or not. However, in my opinion supporting old replay files should not be a goal. I'm having second thoughts on not supporting old replay files. It's easy enough to add a version number, which allows us to introduce incompatible changes without breaking old replay files. I'll probably introduce a version number with my fix for 8276095. ------------- PR: https://git.openjdk.java.net/jdk/pull/6189 From manc at openjdk.java.net Wed Nov 3 01:39:31 2021 From: manc at openjdk.java.net (Man Cao) Date: Wed, 3 Nov 2021 01:39:31 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v2] In-Reply-To: References: Message-ID: > Hi all, > > Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. > If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". Man Cao has updated the pull request incrementally with one additional commit since the last revision: Fix errors related NULL value without --disable-warnings-as-errors ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6221/files - new: https://git.openjdk.java.net/jdk/pull/6221/files/a879d4df..b0ef5024 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=00-01 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/6221.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6221/head:pull/6221 PR: https://git.openjdk.java.net/jdk/pull/6221 From manc at openjdk.java.net Wed Nov 3 01:51:38 2021 From: manc at openjdk.java.net (Man Cao) Date: Wed, 3 Nov 2021 01:51:38 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v3] In-Reply-To: References: Message-ID: > Hi all, > > Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. > If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". Man Cao has updated the pull request incrementally with one additional commit since the last revision: Remove constructor that takes int to fix build error ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6221/files - new: https://git.openjdk.java.net/jdk/pull/6221/files/b0ef5024..71c8528e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=01-02 Stats: 4 lines in 2 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6221.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6221/head:pull/6221 PR: https://git.openjdk.java.net/jdk/pull/6221 From manc at openjdk.java.net Wed Nov 3 01:51:42 2021 From: manc at openjdk.java.net (Man Cao) Date: Wed, 3 Nov 2021 01:51:42 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v2] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 01:39:31 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix errors related NULL value without --disable-warnings-as-errors @rasbold in case you'd like to follow this pull request. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From duke at openjdk.java.net Wed Nov 3 05:18:14 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Wed, 3 Nov 2021 05:18:14 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled [v3] In-Reply-To: References: <1J1d_whrKEmMGGYm9ZfhuiQo7pq677jppXGfvEZrkNA=.821ff6a1-4d51-4069-b256-a1ca917d2030@github.com> <8swcAk7XCw0rkiyKOuZ2zo-gUbk-8aIBMS7lPexpXnA=.310188e3-3e38-4ea3-be75-1c6201f195b2@github.com> Message-ID: On Tue, 2 Nov 2021 16:07:04 GMT, Igor Veresov wrote: >> I think for now, going with ` @Warmup(5000)` seems reasonable. Some thoughts about the IR framework: >> - Default warmup: I agree that with `-XX:-TieredCompilation`, the warmup seems to be too short. I think it would make sense to use another default warmup value for C2-only configs if IR matching would be performed. Setting `@Warmup`, however, should always override it. I'm not sure what a good default value should be. Any thoughts @veresov? >> - On top of that, our CI currently only runs `-XX:-TieredCompilation` in combination with `CompileThreshold` which is not whitelisted. This means that IR matching is not performed in that case. That's the reason why we have not detected this bug here in our testing. I think the IR framework should be improved to just ignore any `CompileThreshold` flag settings and allow IR matching to be performed with `-XX:-TieredCompilation` in our CI. >> >> If others agree, I will file RFEs for both of these. > > @chhagedorn, ok, let's go with the current solution for now then. > As for the default warmup, I would probably expose `CompilationPolicy::min_invocations()` through the WB API and do the warmup based on that. There are a lot of ways different flags may affect the thresholds, I think we just need an authoritative API point to tell us the minimum number of invocations. I would also disable various feedback mechanisms in the policy to make it more deterministic. So, `Tier4LoadFeedback=1000000, Tier3LoadFeedback=1000000, Tier3DelayOn=1000000, Tier0Delay=1000000, TieredCompileTaskTimeout=1000000`; or we should add a single option to disable all adaptive features. @veresov Could you please sponsor it for me? ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From duke at openjdk.java.net Wed Nov 3 05:54:15 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Wed, 3 Nov 2021 05:54:15 GMT Subject: Integrated: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun This pull request has now been integrated. Changeset: 87b926eb Author: sunguoyun Committer: Igor Veresov URL: https://git.openjdk.java.net/jdk/commit/87b926ebb7f1e341da858f7a9892377586abc026 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled Reviewed-by: iveresov ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From manc at openjdk.java.net Wed Nov 3 06:34:40 2021 From: manc at openjdk.java.net (Man Cao) Date: Wed, 3 Nov 2021 06:34:40 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v4] In-Reply-To: References: Message-ID: > Hi all, > > Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. > If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". Man Cao has updated the pull request incrementally with one additional commit since the last revision: Fix build errors on non-x86 or non-Linux environments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6221/files - new: https://git.openjdk.java.net/jdk/pull/6221/files/71c8528e..ae81097a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=02-03 Stats: 15 lines in 10 files changed: 0 ins; 1 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/6221.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6221/head:pull/6221 PR: https://git.openjdk.java.net/jdk/pull/6221 From thartmann at openjdk.java.net Wed Nov 3 06:56:07 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 3 Nov 2021 06:56:07 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v2] In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: On Tue, 2 Nov 2021 09:51:27 GMT, TatWai Chong wrote: >> All tests passed. > > @TobiHartmann, The latest patch is the final version. Could you re-run testing again? Many thanks. @tatwaichong All tests passed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From thartmann at openjdk.java.net Wed Nov 3 07:06:10 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 3 Nov 2021 07:06:10 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance In-Reply-To: References: Message-ID: On Wed, 27 Oct 2021 14:26:49 GMT, Jie Fu wrote: > Hi all, > > I'd like to reset the value of `LoopPercentProfileLimit` (from 30 to the original 10) for x86 to fix performance degradation. > > We had observed that for the same Java App, the performance of x86 is slower than that of aarch64. > But the x86's performance should not be so worse than the aarch64 according to some SPEC benchmark results. > > After some investigation, it seems that the slowness of x86 is caused by the different default settings of `LoopPercentProfileLimit` (30 for x86, but 10 for other platforms). > If we change `LoopPercentProfileLimit` from 30 to 10, x86 would run faster. > > In JDK-8149421, `LoopPercentProfileLimit` [1] was first added and set to be 30 for x86 and 10 for other platforms. > Logically, the default value of `LoopPercentProfileLimit` is 10 for all platforms even before JDK-8149421. > This is because when `LoopPercentProfileLimit=10`, `10.0` [2] equals `100.0 / LoopPercentProfileLimit` [3]. > So if we set `LoopPercentProfileLimit=10`, this unrolling rule [3] would be the same as the original design before JDK-8149421. > > One most important fact is that from the very beginning of OpenJDK source code, the default value of `LoopPercentProfileLimit` (logically) is 10 for all platforms. > So I suggest resetting `LoopPercentProfileLimit` to the original value (10) for x86, just as other platforms. > > I've noted that the review thread mentioned that JDK-8149421 would be beneficial for some SPECjvm2008 benchmarks [4]. > Then I run SPECjvm2008 with `LoopPercentProfileLimit=10` finding that there is no performance drop on x86. > So it won't revert JDK-8149421's opts for SPECjvm2008. > > To show the potential improvement of this change, I've made a jmh test in the patch. > Performance can be improved by 1.25x ~ 2.0x according to this micro benchmark. > > Any comments? > > Thanks. > Best regards, > Jie > > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L908 > [2] https://github.com/openjdk/jdk8u/blob/master/hotspot/src/share/vm/opto/loopTransform.cpp#L673 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L903 > [4] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021205.html > > ratio > > before > > after > So do you agree that we'd better restore LoopPercentProfileLimit=10 if post loop vectorization is disabled? Yes, that seems reasonable to me. We should re-evaluate the default value once post loop vectorization is fixed. Changes requested by thartmann (Reviewer). test/micro/org/openjdk/bench/vm/compiler/LoopUnroll.java line 37: > 35: @Fork(value=1) > 36: public class LoopUnroll { > 37: @Param({"16", "32", "64", "128", "256", "512", "1024"}) We use four whitespace indentation for Java code. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6142 From thartmann at openjdk.java.net Wed Nov 3 07:12:09 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 3 Nov 2021 07:12:09 GMT Subject: RFR: 8276157: C2: Compiler stack overflow during escape analysis on Linux x86_32 In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 10:06:26 GMT, Aleksey Shipilev wrote: > See the bug for test details and analysis. I believe we just legitimately run out of stack in `fastdebug` builds. The fix is to increase the default stack size a bit. Linux-S390, Windows-x86/AArch64 seems to do a similar thing. > > I can do a similar change in `globals_bsd_x86.hpp`, but that would be a blind change, as I don't have platforms to verify that change sanity. I would prefer to make a Linux-specific fix at this time. > > Additional testing: > - [x] Failing test now passes on Linux x86_32 > - [x] Linux x86_32 fastdebug `tier1` Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6167 From jiefu at openjdk.java.net Wed Nov 3 07:24:35 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 3 Nov 2021 07:24:35 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance [v2] In-Reply-To: References: Message-ID: > Hi all, > > I'd like to reset the value of `LoopPercentProfileLimit` (from 30 to the original 10) for x86 to fix performance degradation. > > We had observed that for the same Java App, the performance of x86 is slower than that of aarch64. > But the x86's performance should not be so worse than the aarch64 according to some SPEC benchmark results. > > After some investigation, it seems that the slowness of x86 is caused by the different default settings of `LoopPercentProfileLimit` (30 for x86, but 10 for other platforms). > If we change `LoopPercentProfileLimit` from 30 to 10, x86 would run faster. > > In JDK-8149421, `LoopPercentProfileLimit` [1] was first added and set to be 30 for x86 and 10 for other platforms. > Logically, the default value of `LoopPercentProfileLimit` is 10 for all platforms even before JDK-8149421. > This is because when `LoopPercentProfileLimit=10`, `10.0` [2] equals `100.0 / LoopPercentProfileLimit` [3]. > So if we set `LoopPercentProfileLimit=10`, this unrolling rule [3] would be the same as the original design before JDK-8149421. > > One most important fact is that from the very beginning of OpenJDK source code, the default value of `LoopPercentProfileLimit` (logically) is 10 for all platforms. > So I suggest resetting `LoopPercentProfileLimit` to the original value (10) for x86, just as other platforms. > > I've noted that the review thread mentioned that JDK-8149421 would be beneficial for some SPECjvm2008 benchmarks [4]. > Then I run SPECjvm2008 with `LoopPercentProfileLimit=10` finding that there is no performance drop on x86. > So it won't revert JDK-8149421's opts for SPECjvm2008. > > To show the potential improvement of this change, I've made a jmh test in the patch. > Performance can be improved by 1.25x ~ 2.0x according to this micro benchmark. > > Any comments? > > Thanks. > Best regards, > Jie > > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L908 > [2] https://github.com/openjdk/jdk8u/blob/master/hotspot/src/share/vm/opto/loopTransform.cpp#L673 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L903 > [4] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021205.html > > ratio > > before > > after Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Fix indentation for Java code by using 4 whitespace ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6142/files - new: https://git.openjdk.java.net/jdk/pull/6142/files/66c9a5a5..e824ac62 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6142&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6142&range=00-01 Stats: 49 lines in 1 file changed: 8 ins; 8 del; 33 mod Patch: https://git.openjdk.java.net/jdk/pull/6142.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6142/head:pull/6142 PR: https://git.openjdk.java.net/jdk/pull/6142 From jiefu at openjdk.java.net Wed Nov 3 07:24:36 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 3 Nov 2021 07:24:36 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance [v2] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 07:02:52 GMT, Tobias Hartmann wrote: > We use four whitespace indentation for Java code. Fixed. Thanks for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From chagedorn at openjdk.java.net Wed Nov 3 08:12:11 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 3 Nov 2021 08:12:11 GMT Subject: RFR: 8276227: ciReplay: SIGSEGV if classfile for replay compilation is not present after JDK-8275868 [v2] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 00:52:28 GMT, Dean Long wrote: >> src/hotspot/share/ci/ciReplay.cpp line 890: >> >>> 888: // This also ensures that older replay files work. >>> 889: _protection_domain_initialized = true; >>> 890: >> >> I don't see how this helps older replay files. In fact, it seems like it could make replay for older replay files fail, if the first entry has a different protection domain than the main class. >> >> If we really want to preserve the old behavior of old replay files, then I think we need to add a version number or some other keyword so that we can tell if a replay file is old or not. However, in my opinion supporting old replay files should not be a goal. > > I'm having second thoughts on not supporting old replay files. It's easy enough to add a version number, which allows us to introduce incompatible changes without breaking old replay files. I'll probably introduce a version number with my fix for 8276095. You're right. A version number would solve this completely. This implementation is more robust than the previous one but not complete. Here we only try to set the protection domain once where in the previous implementation, we would have picked the first non-null protection domain found (which could happen after looking at many classes). Do you want to revisit this code with the introduction of version numbers in 8276095 and we proceed with this temporary fix? ------------- PR: https://git.openjdk.java.net/jdk/pull/6189 From chagedorn at openjdk.java.net Wed Nov 3 08:47:19 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 3 Nov 2021 08:47:19 GMT Subject: RFR: 8276044: ciReplay: C1 does not dump a replay file when using DumpReplay as compile command option In-Reply-To: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> References: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> Message-ID: On Mon, 1 Nov 2021 15:02:36 GMT, Christian Hagedorn wrote: > This patch adds support to dump replay files for C1 with the compile command `DumpReplay`. I added a test to verify that a replay file is dumped with C1 (and C2). > > Thanks, > Christian Thanks Dean for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6190 From chagedorn at openjdk.java.net Wed Nov 3 08:47:21 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 3 Nov 2021 08:47:21 GMT Subject: Integrated: 8276044: ciReplay: C1 does not dump a replay file when using DumpReplay as compile command option In-Reply-To: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> References: <362RCiftK90J2zK7S7n5uSlLv17hwj2QTl5S9aFvhj8=.6ee835b9-f9fe-4f98-84be-5a41fa7e2cd7@github.com> Message-ID: On Mon, 1 Nov 2021 15:02:36 GMT, Christian Hagedorn wrote: > This patch adds support to dump replay files for C1 with the compile command `DumpReplay`. I added a test to verify that a replay file is dumped with C1 (and C2). > > Thanks, > Christian This pull request has now been integrated. Changeset: 7439b59b Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/7439b59b5a6816269b16d210ef10779fc9def8e2 Stats: 108 lines in 3 files changed: 102 ins; 0 del; 6 mod 8276044: ciReplay: C1 does not dump a replay file when using DumpReplay as compile command option Reviewed-by: kvn, thartmann, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/6190 From chagedorn at openjdk.java.net Wed Nov 3 08:49:16 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 3 Nov 2021 08:49:16 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled [v3] In-Reply-To: <8swcAk7XCw0rkiyKOuZ2zo-gUbk-8aIBMS7lPexpXnA=.310188e3-3e38-4ea3-be75-1c6201f195b2@github.com> References: <1J1d_whrKEmMGGYm9ZfhuiQo7pq677jppXGfvEZrkNA=.821ff6a1-4d51-4069-b256-a1ca917d2030@github.com> <8swcAk7XCw0rkiyKOuZ2zo-gUbk-8aIBMS7lPexpXnA=.310188e3-3e38-4ea3-be75-1c6201f195b2@github.com> Message-ID: On Tue, 2 Nov 2021 12:25:38 GMT, Christian Hagedorn wrote: >> SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: >> >> 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled > > I think for now, going with ` @Warmup(5000)` seems reasonable. Some thoughts about the IR framework: > - Default warmup: I agree that with `-XX:-TieredCompilation`, the warmup seems to be too short. I think it would make sense to use another default warmup value for C2-only configs if IR matching would be performed. Setting `@Warmup`, however, should always override it. I'm not sure what a good default value should be. Any thoughts @veresov? > - On top of that, our CI currently only runs `-XX:-TieredCompilation` in combination with `CompileThreshold` which is not whitelisted. This means that IR matching is not performed in that case. That's the reason why we have not detected this bug here in our testing. I think the IR framework should be improved to just ignore any `CompileThreshold` flag settings and allow IR matching to be performed with `-XX:-TieredCompilation` in our CI. > > If others agree, I will file RFEs for both of these. > @chhagedorn, ok, let's go with the current solution for now then. As for the default warmup, I would probably expose `CompilationPolicy::min_invocations()` through the WB API and do the warmup based on that. There are a lot of ways different flags may affect the thresholds, I think we just need an authoritative API point to tell us the minimum number of invocations. I would also disable various feedback mechanisms in the policy to make it more deterministic. So, `Tier4LoadFeedback=1000000, Tier3LoadFeedback=1000000, Tier3DelayOn=1000000, Tier0Delay=1000000, TieredCompileTaskTimeout=1000000`; or we should add a single option to disable all adaptive features. Exposing this method sounds indeed more robust than a custom special handling within the IR framework. I will also consider setting the adaptive flags for the test VM to make it more deterministic. Thanks for your input! I will file the RFE accordingly. ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From shade at openjdk.java.net Wed Nov 3 09:10:15 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 3 Nov 2021 09:10:15 GMT Subject: RFR: 8276157: C2: Compiler stack overflow during escape analysis on Linux x86_32 In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 10:06:26 GMT, Aleksey Shipilev wrote: > See the bug for test details and analysis. I believe we just legitimately run out of stack in `fastdebug` builds. The fix is to increase the default stack size a bit. Linux-S390, Windows-x86/AArch64 seems to do a similar thing. > > I can do a similar change in `globals_bsd_x86.hpp`, but that would be a blind change, as I don't have platforms to verify that change sanity. I would prefer to make a Linux-specific fix at this time. > > Additional testing: > - [x] Failing test now passes on Linux x86_32 > - [x] Linux x86_32 fastdebug `tier1` Thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/6167 From shade at openjdk.java.net Wed Nov 3 09:10:16 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 3 Nov 2021 09:10:16 GMT Subject: Integrated: 8276157: C2: Compiler stack overflow during escape analysis on Linux x86_32 In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 10:06:26 GMT, Aleksey Shipilev wrote: > See the bug for test details and analysis. I believe we just legitimately run out of stack in `fastdebug` builds. The fix is to increase the default stack size a bit. Linux-S390, Windows-x86/AArch64 seems to do a similar thing. > > I can do a similar change in `globals_bsd_x86.hpp`, but that would be a blind change, as I don't have platforms to verify that change sanity. I would prefer to make a Linux-specific fix at this time. > > Additional testing: > - [x] Failing test now passes on Linux x86_32 > - [x] Linux x86_32 fastdebug `tier1` This pull request has now been integrated. Changeset: 465d350d Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/465d350d0b3cac277a58b9f8ece196c1cde68e80 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod 8276157: C2: Compiler stack overflow during escape analysis on Linux x86_32 Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6167 From thartmann at openjdk.java.net Wed Nov 3 11:34:16 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 3 Nov 2021 11:34:16 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance [v2] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 07:24:35 GMT, Jie Fu wrote: >> Hi all, >> >> I'd like to reset the value of `LoopPercentProfileLimit` (from 30 to the original 10) for x86 to fix performance degradation. >> >> We had observed that for the same Java App, the performance of x86 is slower than that of aarch64. >> But the x86's performance should not be so worse than the aarch64 according to some SPEC benchmark results. >> >> After some investigation, it seems that the slowness of x86 is caused by the different default settings of `LoopPercentProfileLimit` (30 for x86, but 10 for other platforms). >> If we change `LoopPercentProfileLimit` from 30 to 10, x86 would run faster. >> >> In JDK-8149421, `LoopPercentProfileLimit` [1] was first added and set to be 30 for x86 and 10 for other platforms. >> Logically, the default value of `LoopPercentProfileLimit` is 10 for all platforms even before JDK-8149421. >> This is because when `LoopPercentProfileLimit=10`, `10.0` [2] equals `100.0 / LoopPercentProfileLimit` [3]. >> So if we set `LoopPercentProfileLimit=10`, this unrolling rule [3] would be the same as the original design before JDK-8149421. >> >> One most important fact is that from the very beginning of OpenJDK source code, the default value of `LoopPercentProfileLimit` (logically) is 10 for all platforms. >> So I suggest resetting `LoopPercentProfileLimit` to the original value (10) for x86, just as other platforms. >> >> I've noted that the review thread mentioned that JDK-8149421 would be beneficial for some SPECjvm2008 benchmarks [4]. >> Then I run SPECjvm2008 with `LoopPercentProfileLimit=10` finding that there is no performance drop on x86. >> So it won't revert JDK-8149421's opts for SPECjvm2008. >> >> To show the potential improvement of this change, I've made a jmh test in the patch. >> Performance can be improved by 1.25x ~ 2.0x according to this micro benchmark. >> >> Any comments? >> >> Thanks. >> Best regards, >> Jie >> >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L908 >> [2] https://github.com/openjdk/jdk8u/blob/master/hotspot/src/share/vm/opto/loopTransform.cpp#L673 >> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L903 >> [4] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021205.html >> >> ratio >> >> before >> >> after > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation for Java code by using 4 whitespace Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6142 From jiefu at openjdk.java.net Wed Nov 3 12:09:09 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 3 Nov 2021 12:09:09 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance [v2] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 11:30:58 GMT, Tobias Hartmann wrote: > Looks good to me. Thanks @TobiHartmann . Will push it tomorrow if there is no objection. ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From duke at openjdk.java.net Wed Nov 3 14:25:24 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 3 Nov 2021 14:25:24 GMT Subject: RFR: 8275729: Qualified method names in CodeHeap Analytics In-Reply-To: <0L9rPQiskq-xc2eMFhRazjcZUFsuQW31_kVBwC47UkA=.6c0b5ec1-8696-47b5-872a-f478866bf0d0@github.com> References: <0L9rPQiskq-xc2eMFhRazjcZUFsuQW31_kVBwC47UkA=.6c0b5ec1-8696-47b5-872a-f478866bf0d0@github.com> Message-ID: <0Vys6k97aZS6oGOUF7u72xRU-rgnqkW0vMDM4zHd2l8=.c0a59ab6-8204-4d68-8123-ba1780f05045@github.com> On Tue, 2 Nov 2021 23:17:30 GMT, Vladimir Kozlov wrote: >> This PR changes nmethods names in `METHOD NAMES for CodeHeap` section to be qualified. >> Testing: >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed > > BTW, you need to update Copyright year in file. @vnkozlov Created PR https://github.com/openjdk/jdk/pull/6234 ------------- PR: https://git.openjdk.java.net/jdk/pull/6200 From duke at openjdk.java.net Wed Nov 3 14:27:22 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 3 Nov 2021 14:27:22 GMT Subject: RFR: 8276429: CodeHeapState::print_names() fails with "assert(klass->is_loader_alive()) failed: must be alive" Message-ID: This PR fixes `applications/kitchensink/Kitchensink.java` regression introduced by JDK-8275729. The requirement for a method holder to be alive is relaxed to the holder not to be NULL. If holder's name is not available the format of the string used for the name is the same as for unavailable method's name and signature, instead of the default string: ``. Testing: - `make run-test TEST=tier1_serviceability`: Passed - `make run-test TEST=hotspot_tier2_serviceability`: Passed - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed ------------- Commit messages: - 8276429: CodeHeapState::print_names() fails with "assert(klass->is_loader_alive()) failed: must be alive" Changes: https://git.openjdk.java.net/jdk/pull/6234/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6234&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276429 Stats: 5 lines in 1 file changed: 1 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6234.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6234/head:pull/6234 PR: https://git.openjdk.java.net/jdk/pull/6234 From kvn at openjdk.java.net Wed Nov 3 14:58:16 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Nov 2021 14:58:16 GMT Subject: RFR: 8276429: CodeHeapState::print_names() fails with "assert(klass->is_loader_alive()) failed: must be alive" In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 14:19:29 GMT, Evgeny Astigeevich wrote: > This PR fixes `applications/kitchensink/Kitchensink.java` regression introduced by JDK-8275729. > The requirement for a method holder to be alive is relaxed to the holder not to be NULL. > If holder's name is not available the format of the string used for the name is the same as for unavailable method's name and signature, instead of the default string: ``. > Testing: > - `make run-test TEST=tier1_serviceability`: Passed > - `make run-test TEST=hotspot_tier2_serviceability`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed Good. I will test it before approval. ------------- PR: https://git.openjdk.java.net/jdk/pull/6234 From chagedorn at openjdk.java.net Wed Nov 3 15:35:19 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 3 Nov 2021 15:35:19 GMT Subject: RFR: JDK-8276036: The value of full_count in the message of insufficient codecache is wrong In-Reply-To: References: Message-ID: <9Zz8MY6Hu7cXvMfYT-nxEpTbcWhxPCgcu9nSoxQ3krg=.4dbc5ca7-14ec-4857-9e79-6b0e7cc8e572@github.com> On Mon, 1 Nov 2021 11:31:39 GMT, Tobias Holenstein wrote: > The value of full_count (number of times the code heap was full) in the message of insufficient codecache was 0 even though a codecache shortage occurred. This is fixed by simply incrementing the count before the printing. Marked as reviewed by chagedorn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6185 From duke at openjdk.java.net Wed Nov 3 15:35:20 2021 From: duke at openjdk.java.net (Tobias Holenstein) Date: Wed, 3 Nov 2021 15:35:20 GMT Subject: Integrated: JDK-8276036: The value of full_count in the message of insufficient codecache is wrong In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 11:31:39 GMT, Tobias Holenstein wrote: > The value of full_count (number of times the code heap was full) in the message of insufficient codecache was 0 even though a codecache shortage occurred. This is fixed by simply incrementing the count before the printing. This pull request has now been integrated. Changeset: 61cb4bc6 Author: Tobias Holenstein Committer: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/61cb4bc6b0252536364a86f38ff2e5c8c7ab610b Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod 8276036: The value of full_count in the message of insufficient codecache is wrong Reviewed-by: kvn, dlong, thartmann, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/6185 From dnsimon at openjdk.java.net Wed Nov 3 16:03:16 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 3 Nov 2021 16:03:16 GMT Subject: RFR: 8276314: [JVMCI] check alignment of call displacement during code installation In-Reply-To: References: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> Message-ID: On Tue, 2 Nov 2021 23:50:06 GMT, Tom Rodriguez wrote: >> src/hotspot/cpu/x86/jvmciCodeInstaller_x86.cpp line 194: >> >>> 192: return; >>> 193: } >>> 194: if (os::is_MP() && !call->is_displacement_aligned()) { >> >> You are checking for `MP` in current era? Why not always require alignment? > > I agree. I think C2 has always aligned and C1 used to check is_MP but no longer does. Requiring JVMCI compilers to always align seems right, particularly since we don't expose is `is_MP` through JVMCI. Graal and C1 actually appear to over align the displacement by aligning it to BytesPerWord while C2 always aligns to 4 which is all that is required. It's odd that alignment check in NativeCall isn't simply `displacement_offset() % 4`. The existing check implies that it's ok to use a misaligned offset as long as it starts and ends within an 8 byte region but I don't know that that would really work and none of the compilers actually take advantage of it. That's probably beyond the scope of this PR. Actually `verify_alignment` checks that it's aligned on BytesPerInt so maybe `is_displacement_aligned` should unify around that definition. I will remove the `is_MP()` calls. >while C2 always aligns to 4 I'm having trouble finding where that is done - can you please point it out. ------------- PR: https://git.openjdk.java.net/jdk/pull/6218 From never at openjdk.java.net Wed Nov 3 16:08:21 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 3 Nov 2021 16:08:21 GMT Subject: RFR: 8276314: [JVMCI] check alignment of call displacement during code installation In-Reply-To: References: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> Message-ID: On Wed, 3 Nov 2021 16:00:30 GMT, Doug Simon wrote: >> I agree. I think C2 has always aligned and C1 used to check is_MP but no longer does. Requiring JVMCI compilers to always align seems right, particularly since we don't expose is `is_MP` through JVMCI. Graal and C1 actually appear to over align the displacement by aligning it to BytesPerWord while C2 always aligns to 4 which is all that is required. It's odd that alignment check in NativeCall isn't simply `displacement_offset() % 4`. The existing check implies that it's ok to use a misaligned offset as long as it starts and ends within an 8 byte region but I don't know that that would really work and none of the compilers actually take advantage of it. That's probably beyond the scope of this PR. Actually `verify_alignment` checks that it's aligned on BytesPerInt so maybe `is_displacement_aligned` should unify around that definition. > > I will remove the `is_MP()` calls. > >>while C2 always aligns to 4 > > I'm having trouble finding where that is done - can you please point it out. It's deeply hidden. in x86_64.ad, it specifies the call alignment as [4](https://github.com/tkrodriguez/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L13028) and in [compute_padding](https://github.com/tkrodriguez/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L493) it uses that alignment value to align the offset of the displacement. ------------- PR: https://git.openjdk.java.net/jdk/pull/6218 From kvn at openjdk.java.net Wed Nov 3 16:10:19 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Nov 2021 16:10:19 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance [v2] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 07:24:35 GMT, Jie Fu wrote: >> Hi all, >> >> I'd like to reset the value of `LoopPercentProfileLimit` (from 30 to the original 10) for x86 to fix performance degradation. >> >> We had observed that for the same Java App, the performance of x86 is slower than that of aarch64. >> But the x86's performance should not be so worse than the aarch64 according to some SPEC benchmark results. >> >> After some investigation, it seems that the slowness of x86 is caused by the different default settings of `LoopPercentProfileLimit` (30 for x86, but 10 for other platforms). >> If we change `LoopPercentProfileLimit` from 30 to 10, x86 would run faster. >> >> In JDK-8149421, `LoopPercentProfileLimit` [1] was first added and set to be 30 for x86 and 10 for other platforms. >> Logically, the default value of `LoopPercentProfileLimit` is 10 for all platforms even before JDK-8149421. >> This is because when `LoopPercentProfileLimit=10`, `10.0` [2] equals `100.0 / LoopPercentProfileLimit` [3]. >> So if we set `LoopPercentProfileLimit=10`, this unrolling rule [3] would be the same as the original design before JDK-8149421. >> >> One most important fact is that from the very beginning of OpenJDK source code, the default value of `LoopPercentProfileLimit` (logically) is 10 for all platforms. >> So I suggest resetting `LoopPercentProfileLimit` to the original value (10) for x86, just as other platforms. >> >> I've noted that the review thread mentioned that JDK-8149421 would be beneficial for some SPECjvm2008 benchmarks [4]. >> Then I run SPECjvm2008 with `LoopPercentProfileLimit=10` finding that there is no performance drop on x86. >> So it won't revert JDK-8149421's opts for SPECjvm2008. >> >> To show the potential improvement of this change, I've made a jmh test in the patch. >> Performance can be improved by 1.25x ~ 2.0x according to this micro benchmark. >> >> Any comments? >> >> Thanks. >> Best regards, >> Jie >> >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L908 >> [2] https://github.com/openjdk/jdk8u/blob/master/hotspot/src/share/vm/opto/loopTransform.cpp#L673 >> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L903 >> [4] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021205.html >> >> ratio >> >> before >> >> after > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation for Java code by using 4 whitespace I agree. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6142 From kvn at openjdk.java.net Wed Nov 3 16:42:15 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Nov 2021 16:42:15 GMT Subject: RFR: 8276314: [JVMCI] check alignment of call displacement during code installation In-Reply-To: References: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> Message-ID: <1stq-i_ulBd0oXG1FSe8nw8wCAPbOMeDgBeTe11rlAA=.c131b71d-2d37-467a-8b41-0f5e0eacb3fb@github.com> On Wed, 3 Nov 2021 16:05:34 GMT, Tom Rodriguez wrote: >> I will remove the `is_MP()` calls. >> >>>while C2 always aligns to 4 >> >> I'm having trouble finding where that is done - can you please point it out. > > It's deeply hidden. in x86_64.ad, it specifies the call alignment as [4](https://github.com/tkrodriguez/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L13028) and in [compute_padding](https://github.com/tkrodriguez/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L493) it uses that alignment value to align the offset of the displacement. Should we fix C1 and JVMCI to do the same as C2? As separate RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/6218 From never at openjdk.java.net Wed Nov 3 19:16:11 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 3 Nov 2021 19:16:11 GMT Subject: RFR: 8276314: [JVMCI] check alignment of call displacement during code installation In-Reply-To: <1stq-i_ulBd0oXG1FSe8nw8wCAPbOMeDgBeTe11rlAA=.c131b71d-2d37-467a-8b41-0f5e0eacb3fb@github.com> References: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> <1stq-i_ulBd0oXG1FSe8nw8wCAPbOMeDgBeTe11rlAA=.c131b71d-2d37-467a-8b41-0f5e0eacb3fb@github.com> Message-ID: <5qSxKXsOc_c7ZmRMUuF0Z9Ku94T_b4BmQ4NN-kZE-hQ=.4c424442-83b7-4f0a-98d5-833ecde1423c@github.com> On Wed, 3 Nov 2021 16:39:22 GMT, Vladimir Kozlov wrote: >> It's deeply hidden. in x86_64.ad, it specifies the call alignment as [4](https://github.com/tkrodriguez/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L13028) and in [compute_padding](https://github.com/tkrodriguez/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L493) it uses that alignment value to align the offset of the displacement. > > Should we fix C1 and JVMCI to do the same as C2? As separate RFE. There's nothing to fix on the JVMCI side other than ensuring that we're asserting the correct alignment restrictions. Reducing the alignment to 4 instead of 8 would be a pure Graal change. ------------- PR: https://git.openjdk.java.net/jdk/pull/6218 From dnsimon at openjdk.java.net Wed Nov 3 19:33:18 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 3 Nov 2021 19:33:18 GMT Subject: RFR: 8276314: [JVMCI] check alignment of call displacement during code installation In-Reply-To: <5qSxKXsOc_c7ZmRMUuF0Z9Ku94T_b4BmQ4NN-kZE-hQ=.4c424442-83b7-4f0a-98d5-833ecde1423c@github.com> References: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> <1stq-i_ulBd0oXG1FSe8nw8wCAPbOMeDgBeTe11rlAA=.c131b71d-2d37-467a-8b41-0f5e0eacb3fb@github.com> <5qSxKXsOc_c7ZmRMUuF0Z9Ku94T_b4BmQ4NN-kZE-hQ=.4c424442-83b7-4f0a-98d5-833ecde1423c@github.com> Message-ID: On Wed, 3 Nov 2021 19:13:06 GMT, Tom Rodriguez wrote: >> Should we fix C1 and JVMCI to do the same as C2? As separate RFE. > > There's nothing to fix on the JVMCI side other than ensuring that we're asserting the correct alignment restrictions. Reducing the alignment to 4 instead of 8 would be a pure Graal change. > It's deeply hidden Thanks - doubt I would ever have found that in reasonable time. ------------- PR: https://git.openjdk.java.net/jdk/pull/6218 From kvn at openjdk.java.net Wed Nov 3 19:33:18 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Nov 2021 19:33:18 GMT Subject: RFR: 8276314: [JVMCI] check alignment of call displacement during code installation In-Reply-To: References: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> <1stq-i_ulBd0oXG1FSe8nw8wCAPbOMeDgBeTe11rlAA=.c131b71d-2d37-467a-8b41-0f5e0eacb3fb@github.com> <5qSxKXsOc_c7ZmRMUuF0Z9Ku94T_b4BmQ4NN-kZE-hQ=.4c424442-83b7-4f0a-98d5-833ecde1423c@github.com> Message-ID: On Wed, 3 Nov 2021 19:30:05 GMT, Doug Simon wrote: >> There's nothing to fix on the JVMCI side other than ensuring that we're asserting the correct alignment restrictions. Reducing the alignment to 4 instead of 8 would be a pure Graal change. > >> It's deeply hidden > > Thanks - doubt I would ever have found that in reasonable time. I meant C1 and Graal. Yes, in JVMCI we should just check correct alignment. ------------- PR: https://git.openjdk.java.net/jdk/pull/6218 From dlong at openjdk.java.net Wed Nov 3 19:45:09 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 3 Nov 2021 19:45:09 GMT Subject: RFR: 8276227: ciReplay: SIGSEGV if classfile for replay compilation is not present after JDK-8275868 [v2] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 08:09:33 GMT, Christian Hagedorn wrote: >> I'm having second thoughts on not supporting old replay files. It's easy enough to add a version number, which allows us to introduce incompatible changes without breaking old replay files. I'll probably introduce a version number with my fix for 8276095. > > You're right. A version number would solve this completely. This implementation is more robust than the previous one but not complete. Here we only try to set the protection domain once where in the previous implementation, we would have picked the first non-null protection domain found (which could happen after looking at many classes). Do you want to revisit this code with the introduction of version numbers in 8276095 and we proceed with this temporary fix? Yes, go ahead with your fix. ------------- PR: https://git.openjdk.java.net/jdk/pull/6189 From manc at openjdk.java.net Wed Nov 3 20:01:37 2021 From: manc at openjdk.java.net (Man Cao) Date: Wed, 3 Nov 2021 20:01:37 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v5] In-Reply-To: References: Message-ID: > Hi all, > > Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. > If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". Man Cao has updated the pull request incrementally with one additional commit since the last revision: Fix aarch64 and arm builds ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6221/files - new: https://git.openjdk.java.net/jdk/pull/6221/files/ae81097a..d881f81d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6221.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6221/head:pull/6221 PR: https://git.openjdk.java.net/jdk/pull/6221 From lucy at openjdk.java.net Wed Nov 3 21:10:10 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Wed, 3 Nov 2021 21:10:10 GMT Subject: RFR: 8276429: CodeHeapState::print_names() fails with "assert(klass->is_loader_alive()) failed: must be alive" In-Reply-To: References: Message-ID: <1mfeNTCSnekHmyg_SPPNH5DRpfMdVpVM52tsSqhnsT8=.b2f84c6d-9493-485c-b245-210006c2daad@github.com> On Wed, 3 Nov 2021 14:19:29 GMT, Evgeny Astigeevich wrote: > This PR fixes `applications/kitchensink/Kitchensink.java` regression introduced by JDK-8275729. > The requirement for a method holder to be alive is relaxed to the holder not to be NULL. > If holder's name is not available the format of the string used for the name is the same as for unavailable method's name and signature, instead of the default string: ``. > Testing: > - `make run-test TEST=tier1_serviceability`: Passed > - `make run-test TEST=hotspot_tier2_serviceability`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed class name retrieval now follows the same steps as method name and method signature. You should be on the safe side now. I can't formally approve without knowing Vladimir's secret tests complete OK. ------------- PR: https://git.openjdk.java.net/jdk/pull/6234 From dnsimon at openjdk.java.net Wed Nov 3 21:31:40 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 3 Nov 2021 21:31:40 GMT Subject: RFR: 8276314: [JVMCI] check alignment of call displacement during code installation [v2] In-Reply-To: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> References: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> Message-ID: <4YqHhWcNmhmWm2QAK5_bllkzfOgXnrRGT46WyVIMBxU=.b5de63c7-3a60-4345-8975-a330a0aa358e@github.com> > This PR add verification of code alignment invariants related to x64 call instructions during code installation. > This in turn allows a JVMCI compilation that generates a misaligned call to fail gracefully (i.e. bailout) instead of the VM crashing when it checks alignment before patching the displacement of a call instruction. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: consolidated verify_aligned with is_displacement_aligned enhanced error message for invalid _next_call_type value removed os::is_MP test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6218/files - new: https://git.openjdk.java.net/jdk/pull/6218/files/e1479108..b21dc4eb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6218&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6218&range=00-01 Stats: 9 lines in 3 files changed: 1 ins; 3 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/6218.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6218/head:pull/6218 PR: https://git.openjdk.java.net/jdk/pull/6218 From kvn at openjdk.java.net Wed Nov 3 22:00:12 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Nov 2021 22:00:12 GMT Subject: RFR: 8276429: CodeHeapState::print_names() fails with "assert(klass->is_loader_alive()) failed: must be alive" In-Reply-To: References: Message-ID: <1tgad_jDdu7bvK1zDuMRp1Be3Mk0zCSAugOD8tfHGSI=.71785f92-5cb8-4c6e-b255-cbee84e66f2f@github.com> On Wed, 3 Nov 2021 14:19:29 GMT, Evgeny Astigeevich wrote: > This PR fixes `applications/kitchensink/Kitchensink.java` regression introduced by JDK-8275729. > The requirement for a method holder to be alive is relaxed to the holder not to be NULL. > If holder's name is not available the format of the string used for the name is the same as for unavailable method's name and signature, instead of the default string: ``. > Testing: > - `make run-test TEST=tier1_serviceability`: Passed > - `make run-test TEST=hotspot_tier2_serviceability`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed After infrastructure hiccup tests finally passed. Please integrate and I will sponsor. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6234 From kvn at openjdk.java.net Wed Nov 3 22:09:16 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Nov 2021 22:09:16 GMT Subject: RFR: 8276314: [JVMCI] check alignment of call displacement during code installation [v2] In-Reply-To: <4YqHhWcNmhmWm2QAK5_bllkzfOgXnrRGT46WyVIMBxU=.b5de63c7-3a60-4345-8975-a330a0aa358e@github.com> References: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> <4YqHhWcNmhmWm2QAK5_bllkzfOgXnrRGT46WyVIMBxU=.b5de63c7-3a60-4345-8975-a330a0aa358e@github.com> Message-ID: On Wed, 3 Nov 2021 21:31:40 GMT, Doug Simon wrote: >> This PR add verification of code alignment invariants related to x64 call instructions during code installation. >> This in turn allows a JVMCI compilation that generates a misaligned call to fail gracefully (i.e. bailout) instead of the VM crashing when it checks alignment before patching the displacement of a call instruction. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > consolidated verify_aligned with is_displacement_aligned > enhanced error message for invalid _next_call_type value > removed os::is_MP test ------------- PR: https://git.openjdk.java.net/jdk/pull/6218 From dnsimon at openjdk.java.net Wed Nov 3 22:29:33 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 3 Nov 2021 22:29:33 GMT Subject: RFR: 8276314: [JVMCI] check alignment of call displacement during code installation [v3] In-Reply-To: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> References: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> Message-ID: > This PR add verification of code alignment invariants related to x64 call instructions during code installation. > This in turn allows a JVMCI compilation that generates a misaligned call to fail gracefully (i.e. bailout) instead of the VM crashing when it checks alignment before patching the displacement of a call instruction. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: fix compilation error ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6218/files - new: https://git.openjdk.java.net/jdk/pull/6218/files/b21dc4eb..807d20f1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6218&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6218&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6218.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6218/head:pull/6218 PR: https://git.openjdk.java.net/jdk/pull/6218 From jiefu at openjdk.java.net Wed Nov 3 22:49:18 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 3 Nov 2021 22:49:18 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance [v2] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 16:07:37 GMT, Vladimir Kozlov wrote: > I agree. Thanks @vnkozlov . ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From jiefu at openjdk.java.net Wed Nov 3 22:49:20 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 3 Nov 2021 22:49:20 GMT Subject: Integrated: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance In-Reply-To: References: Message-ID: On Wed, 27 Oct 2021 14:26:49 GMT, Jie Fu wrote: > Hi all, > > I'd like to reset the value of `LoopPercentProfileLimit` (from 30 to the original 10) for x86 to fix performance degradation. > > We had observed that for the same Java App, the performance of x86 is slower than that of aarch64. > But the x86's performance should not be so worse than the aarch64 according to some SPEC benchmark results. > > After some investigation, it seems that the slowness of x86 is caused by the different default settings of `LoopPercentProfileLimit` (30 for x86, but 10 for other platforms). > If we change `LoopPercentProfileLimit` from 30 to 10, x86 would run faster. > > In JDK-8149421, `LoopPercentProfileLimit` [1] was first added and set to be 30 for x86 and 10 for other platforms. > Logically, the default value of `LoopPercentProfileLimit` is 10 for all platforms even before JDK-8149421. > This is because when `LoopPercentProfileLimit=10`, `10.0` [2] equals `100.0 / LoopPercentProfileLimit` [3]. > So if we set `LoopPercentProfileLimit=10`, this unrolling rule [3] would be the same as the original design before JDK-8149421. > > One most important fact is that from the very beginning of OpenJDK source code, the default value of `LoopPercentProfileLimit` (logically) is 10 for all platforms. > So I suggest resetting `LoopPercentProfileLimit` to the original value (10) for x86, just as other platforms. > > I've noted that the review thread mentioned that JDK-8149421 would be beneficial for some SPECjvm2008 benchmarks [4]. > Then I run SPECjvm2008 with `LoopPercentProfileLimit=10` finding that there is no performance drop on x86. > So it won't revert JDK-8149421's opts for SPECjvm2008. > > To show the potential improvement of this change, I've made a jmh test in the patch. > Performance can be improved by 1.25x ~ 2.0x according to this micro benchmark. > > Any comments? > > Thanks. > Best regards, > Jie > > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L908 > [2] https://github.com/openjdk/jdk8u/blob/master/hotspot/src/share/vm/opto/loopTransform.cpp#L673 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L903 > [4] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021205.html > > ratio > > before > > after This pull request has now been integrated. Changeset: 0ab910d6 Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/0ab910d626a05106e1366438aeb5b16e16374c2f Stats: 89 lines in 2 files changed: 88 ins; 0 del; 1 mod 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From kvn at openjdk.java.net Wed Nov 3 23:06:11 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Nov 2021 23:06:11 GMT Subject: RFR: 8276314: [JVMCI] check alignment of call displacement during code installation [v3] In-Reply-To: References: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> Message-ID: On Wed, 3 Nov 2021 22:29:33 GMT, Doug Simon wrote: >> This PR add verification of code alignment invariants related to x64 call instructions during code installation. >> This in turn allows a JVMCI compilation that generates a misaligned call to fail gracefully (i.e. bailout) instead of the VM crashing when it checks alignment before patching the displacement of a call instruction. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fix compilation error Good. Please, run mach5 testing before push. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6218 From dlong at openjdk.java.net Wed Nov 3 23:25:27 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 3 Nov 2021 23:25:27 GMT Subject: RFR: 8275670: ciReplay: java.lang.NoClassDefFoundError when trying to load java/lang/invoke/LambdaForm$MH Message-ID: The invokedynamic support added by 8271911 was incomplete. In particular, it failed to search through argL fields. This change fixes that, adds a test, and also cleans up the code a bit. ------------- Commit messages: - cleanup test - search through argL fields Changes: https://git.openjdk.java.net/jdk/pull/6243/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6243&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275670 Stats: 116 lines in 3 files changed: 90 ins; 1 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/6243.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6243/head:pull/6243 PR: https://git.openjdk.java.net/jdk/pull/6243 From duke at openjdk.java.net Thu Nov 4 01:19:17 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Thu, 4 Nov 2021 01:19:17 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v6] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 02:54:39 GMT, ?? wrote: >> `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. >> >> For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. >> image >> >> >> In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. >> image >> >> There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: >> >> diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp >> index 38b40a6..31ff172 100644 >> --- a/src/hotspot/share/opto/ifnode.cpp >> +++ b/src/hotspot/share/opto/ifnode.cpp >> @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { >> } >> } >> >> + if (is_LongCountedLoopEnd()) { >> + set_req(0, dom->in(0)); >> + set_req(1, dom->in(1)); >> + dom->set_req(0, pre); >> + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); >> + Node* proj0 = raw_out(0); >> + Node* proj1 = raw_out(1); >> + Node* dom_proj0 = dom->raw_out(0); >> + Node* dom_proj1 = dom->raw_out(1); >> + dom_proj0->set_req(0, this); >> + dom_proj1->set_req(0, this); >> + proj0->set_req(0, dom); >> + proj1->set_req(0, dom); >> + } >> + >> if (bol->outcnt() == 0) { >> igvn->remove_dead_node(bol); // Kill the BoolNode. >> } >> diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp >> index 6f7e34d..7955722 100644 >> --- a/src/hotspot/share/opto/loopnode.cpp >> +++ b/src/hotspot/share/opto/loopnode.cpp >> @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List >> Node* back_control = head->in(LoopNode::LoopBackControl); >> >> // data nodes on back branch not supported >> - if (back_control->outcnt() > 1) { >> + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { >> return false; >> } > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Specify vm option needs option 'othervm' ping, can I have more review of this pr? ------------- PR: https://git.openjdk.java.net/jdk/pull/6099 From kvn at openjdk.java.net Thu Nov 4 06:32:16 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 4 Nov 2021 06:32:16 GMT Subject: RFR: 8275670: ciReplay: java.lang.NoClassDefFoundError when trying to load java/lang/invoke/LambdaForm$MH In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 23:15:41 GMT, Dean Long wrote: > The invokedynamic support added by 8271911 was incomplete. In particular, it failed to search through argL fields. This change fixes that, adds a test, and also cleans up the code a bit. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6243 From njian at openjdk.java.net Thu Nov 4 07:26:40 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 4 Nov 2021 07:26:40 GMT Subject: RFR: 8276151: AArch64: Incorrect result for double to int vector conversion Message-ID: Current NEON vector double to integer conversion generates code to convert double to long first and then narrow to integer, which does not follow Java language spec [1], and will get incorrect results for double values larger than Integer.MAX_VALUE or less than Integer.MIN_VALUE. For those too large/small values, result should be the largest/smallest representable value of type int, but converting to long and then narrowing to int will get different results. There's no direct double to int vector conversion NEON instruction, so we simply do it with scalar conversion. Also update compiler/vectorapi test cases to cover some corner cases. [1] https://docs.oracle.com/javase/specs/jls/se17/html/jls-5.html#jls-5.1.3 ------------- Commit messages: - 8276151: AArch64: Incorrect result for double to int vector conversion Changes: https://git.openjdk.java.net/jdk/pull/6247/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6247&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276151 Stats: 169 lines in 4 files changed: 139 ins; 1 del; 29 mod Patch: https://git.openjdk.java.net/jdk/pull/6247.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6247/head:pull/6247 PR: https://git.openjdk.java.net/jdk/pull/6247 From dlong at openjdk.java.net Thu Nov 4 08:38:19 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 4 Nov 2021 08:38:19 GMT Subject: RFR: 8275670: ciReplay: java.lang.NoClassDefFoundError when trying to load java/lang/invoke/LambdaForm$MH In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 23:15:41 GMT, Dean Long wrote: > The invokedynamic support added by 8271911 was incomplete. In particular, it failed to search through argL fields. This change fixes that, adds a test, and also cleans up the code a bit. Thanks Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/6243 From chagedorn at openjdk.java.net Thu Nov 4 08:54:09 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 4 Nov 2021 08:54:09 GMT Subject: RFR: 8276227: ciReplay: SIGSEGV if classfile for replay compilation is not present after JDK-8275868 [v2] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 19:42:39 GMT, Dean Long wrote: >> You're right. A version number would solve this completely. This implementation is more robust than the previous one but not complete. Here we only try to set the protection domain once where in the previous implementation, we would have picked the first non-null protection domain found (which could happen after looking at many classes). Do you want to revisit this code with the introduction of version numbers in 8276095 and we proceed with this temporary fix? > > Yes, go ahead with your fix. Great, thanks Dean for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6189 From chagedorn at openjdk.java.net Thu Nov 4 08:57:19 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 4 Nov 2021 08:57:19 GMT Subject: Integrated: 8276227: ciReplay: SIGSEGV if classfile for replay compilation is not present after JDK-8275868 In-Reply-To: References: Message-ID: <54kmM6W4izuYCB69Y40e_91M3qaTtxLPd7Hu-AEcr0E=.2d620d79-5cef-4e52-91f8-6e8262f12f8e@github.com> On Mon, 1 Nov 2021 13:25:03 GMT, Christian Hagedorn wrote: > The fix for [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868) does not handle the case when the classfile for the method to be replay compiled is not present. This will fail to load the klass. Afterwards, we are trying to access the protection domain of the failed to load klass (i.e. a null pointer) which results in a segmentation fault. The fix is straight forward to only set the new protection domain if the klass was loaded successfully. I additionally changed the code such that we are only trying to set the protection domain when reading the first `instanceKlass` entry. This avoids some potential problems with older replay files where we do not have this additional first entry set by JDK-8275868. > > Thanks, > Christian This pull request has now been integrated. Changeset: a1f4c428 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/a1f4c428ba1b78a4e18afb87c94a5c731a5aa706 Stats: 82 lines in 2 files changed: 81 ins; 0 del; 1 mod 8276227: ciReplay: SIGSEGV if classfile for replay compilation is not present after JDK-8275868 Reviewed-by: kvn, thartmann, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/6189 From aph at openjdk.java.net Thu Nov 4 09:25:09 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 4 Nov 2021 09:25:09 GMT Subject: RFR: 8276151: AArch64: Incorrect result for double to int vector conversion In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 07:19:55 GMT, Ningsheng Jian wrote: > Current NEON vector double to integer conversion generates code to convert double to long first and then narrow to integer, which does not follow Java language spec [1], and will get incorrect results for double values larger than Integer.MAX_VALUE or less than Integer.MIN_VALUE. For those too large/small values, result should be the largest/smallest representable value of type int, but converting to long and then narrowing to int will get different results. > > There's no direct double to int vector conversion NEON instruction, so we simply do it with scalar conversion. > > Also update compiler/vectorapi test cases to cover some corner cases. > > [1] https://docs.oracle.com/javase/specs/jls/se17/html/jls-5.html#jls-5.1.3 src/hotspot/cpu/aarch64/aarch64_neon.ad line 518: > 516: ins_encode %{ > 517: __ ins(as_FloatRegister($dst$$reg), __ D, as_FloatRegister($src$$reg), 0, 1); > 518: // Converting from double to int directly follows Java specification. Suggestion: // We can't use fcvtzs(vector, integer) here because we need saturation arithmetic. See JDK-8276151. ------------- PR: https://git.openjdk.java.net/jdk/pull/6247 From njian at openjdk.java.net Thu Nov 4 09:56:40 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 4 Nov 2021 09:56:40 GMT Subject: RFR: 8276151: AArch64: Incorrect result for double to int vector conversion [v2] In-Reply-To: References: Message-ID: > Current NEON vector double to integer conversion generates code to convert double to long first and then narrow to integer, which does not follow Java language spec [1], and will get incorrect results for double values larger than Integer.MAX_VALUE or less than Integer.MIN_VALUE. For those too large/small values, result should be the largest/smallest representable value of type int, but converting to long and then narrowing to int will get different results. > > There's no direct double to int vector conversion NEON instruction, so we simply do it with scalar conversion. > > Also update compiler/vectorapi test cases to cover some corner cases. > > [1] https://docs.oracle.com/javase/specs/jls/se17/html/jls-5.html#jls-5.1.3 Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: Address review comments from Andrew. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6247/files - new: https://git.openjdk.java.net/jdk/pull/6247/files/0950e686..46b60e2b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6247&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6247&range=00-01 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6247.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6247/head:pull/6247 PR: https://git.openjdk.java.net/jdk/pull/6247 From njian at openjdk.java.net Thu Nov 4 09:56:43 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 4 Nov 2021 09:56:43 GMT Subject: RFR: 8276151: AArch64: Incorrect result for double to int vector conversion [v2] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 09:21:43 GMT, Andrew Haley wrote: >> Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments from Andrew. > > src/hotspot/cpu/aarch64/aarch64_neon.ad line 518: > >> 516: ins_encode %{ >> 517: __ ins(as_FloatRegister($dst$$reg), __ D, as_FloatRegister($src$$reg), 0, 1); >> 518: // Converting from double to int directly follows Java specification. > > Suggestion: > > // We can't use fcvtzs(vector, integer) here because we need saturation arithmetic. See JDK-8276151. Thank you @theRealAph ! Updated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6247 From aph at openjdk.java.net Thu Nov 4 10:10:10 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 4 Nov 2021 10:10:10 GMT Subject: RFR: 8276151: AArch64: Incorrect result for double to int vector conversion [v2] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 09:56:40 GMT, Ningsheng Jian wrote: >> Current NEON vector double to integer conversion generates code to convert double to long first and then narrow to integer, which does not follow Java language spec [1], and will get incorrect results for double values larger than Integer.MAX_VALUE or less than Integer.MIN_VALUE. For those too large/small values, result should be the largest/smallest representable value of type int, but converting to long and then narrowing to int will get different results. >> >> There's no direct double to int vector conversion NEON instruction, so we simply do it with scalar conversion. >> >> Also update compiler/vectorapi test cases to cover some corner cases. >> >> [1] https://docs.oracle.com/javase/specs/jls/se17/html/jls-5.html#jls-5.1.3 > > Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments from Andrew. Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6247 From chagedorn at openjdk.java.net Thu Nov 4 10:32:14 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 4 Nov 2021 10:32:14 GMT Subject: RFR: 8275670: ciReplay: java.lang.NoClassDefFoundError when trying to load java/lang/invoke/LambdaForm$MH In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 23:15:41 GMT, Dean Long wrote: > The invokedynamic support added by 8271911 was incomplete. In particular, it failed to search through argL fields. This change fixes that, adds a test, and also cleans up the code a bit. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6243 From simonis at openjdk.java.net Thu Nov 4 12:18:46 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 4 Nov 2021 12:18:46 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v4] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters - Fix special case where we're creating an implicit exception for a regular invoke* bytecode - Minor updates as requested by @TheRealMDoerr - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow ------------- Changes: https://git.openjdk.java.net/jdk/pull/5488/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=03 Stats: 747 lines in 15 files changed: 739 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Nov 4 12:35:11 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 4 Nov 2021 12:35:11 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v4] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 12:18:46 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters > - Fix special case where we're creating an implicit exception for a regular invoke* bytecode > - Minor updates as requested by @TheRealMDoerr > - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow Hi, sorry for the delay. I've had a look at the IR Test Framework but I didn't found it to be a best fit for this change. I also wanted to have a test which woks in both, product and debug builds. So I have instead extended the Whitebox API to expose the decompile, deopt and trap counters. I think (and hope) this functionality will be helpful for others in the future. The test itself got quite elaborate which is partially because different built-in exceptions are currently profiled and compiled differently (see [JDK-8275908: Record null_check traps for calls and array_check traps in the interpreter](https://bugs.openjdk.java.net/browse/JDK-8275908)). The current jtreg test can also serve as a test for JDK-8275908 once it will be fixed (just have to set the `JDK8275908_fixed` field to `true`). As I've mentioned before, I did run a full set of jtreg and JCK tests together with some benchmark suits with a special build with `-XX:-OmitStackTraceInFastThrow` disabled by default and couldn't find any issue. Please take a look, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From duke at openjdk.java.net Thu Nov 4 15:05:16 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 4 Nov 2021 15:05:16 GMT Subject: Integrated: 8276429: CodeHeapState::print_names() fails with "assert(klass->is_loader_alive()) failed: must be alive" In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 14:19:29 GMT, Evgeny Astigeevich wrote: > This PR fixes `applications/kitchensink/Kitchensink.java` regression introduced by JDK-8275729. > The requirement for a method holder to be alive is relaxed to the holder not to be NULL. > If holder's name is not available the format of the string used for the name is the same as for unavailable method's name and signature, instead of the default string: ``. > Testing: > - `make run-test TEST=tier1_serviceability`: Passed > - `make run-test TEST=hotspot_tier2_serviceability`: Passed > - `make run-test TEST=serviceability/dcmd/compiler/CodeHeapAnalyticsMethodNames.java`: Passed This pull request has now been integrated. Changeset: 5acff753 Author: Evgeny Astigeevich Committer: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/5acff75379a4ad0acfcfc6a64fcc4b588ef048c7 Stats: 5 lines in 1 file changed: 1 ins; 1 del; 3 mod 8276429: CodeHeapState::print_names() fails with "assert(klass->is_loader_alive()) failed: must be alive" Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6234 From simonis at openjdk.java.net Thu Nov 4 16:02:48 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 4 Nov 2021 16:02:48 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v5] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Fix build issue for minimal/zero build ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5488/files - new: https://git.openjdk.java.net/jdk/pull/5488/files/8043f8d0..bdf37bf2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=03-04 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Nov 4 16:28:52 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 4 Nov 2021 16:28:52 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: References: Message-ID: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5488/files - new: https://git.openjdk.java.net/jdk/pull/5488/files/bdf37bf2..99db7e54 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=04-05 Stats: 30 lines in 1 file changed: 30 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From psandoz at openjdk.java.net Thu Nov 4 16:50:13 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 4 Nov 2021 16:50:13 GMT Subject: RFR: 8276151: AArch64: Incorrect result for double to int vector conversion [v2] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 09:56:40 GMT, Ningsheng Jian wrote: >> Current NEON vector double to integer conversion generates code to convert double to long first and then narrow to integer, which does not follow Java language spec [1], and will get incorrect results for double values larger than Integer.MAX_VALUE or less than Integer.MIN_VALUE. For those too large/small values, result should be the largest/smallest representable value of type int, but converting to long and then narrowing to int will get different results. >> >> There's no direct double to int vector conversion NEON instruction, so we simply do it with scalar conversion. >> >> Also update compiler/vectorapi test cases to cover some corner cases. >> >> [1] https://docs.oracle.com/javase/specs/jls/se17/html/jls-5.html#jls-5.1.3 > > Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments from Andrew. Java changes look good. What are your thoughts about applying the changes to all `VectorCastShape*Test.java`? Since I presume this could also impact ARM SVE and other architectures. The duplication could be removed by placing the repeated code in a super class. This could be a follow on update if needed after integration of https://github.com/openjdk/jdk/pull/5873. ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6247 From duke at openjdk.java.net Thu Nov 4 17:05:37 2021 From: duke at openjdk.java.net (Takuya Kiriyama) Date: Thu, 4 Nov 2021 17:05:37 GMT Subject: RFR: 8276036: The value of full_count in the message of insufficient codecache is wrong Message-ID: Could you please review the 8276036 bug fixes? This bug is caused by the wrong place to add the value of full_count. The initial value of full_count is 0, so it needs to be added before outputting the message. ------------- Commit messages: - 8276036: The value of full_count in the message of insufficient codecache is wrong - 8276036: The value of full_count in the message of insufficient codecache is wrong - 8276036: The value of full_count in the message of insufficient codecache is wrong Changes: https://git.openjdk.java.net/jdk/pull/6129/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6129&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276036 Stats: 133 lines in 2 files changed: 131 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6129.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6129/head:pull/6129 PR: https://git.openjdk.java.net/jdk/pull/6129 From duke at openjdk.java.net Thu Nov 4 17:05:38 2021 From: duke at openjdk.java.net (Takuya Kiriyama) Date: Thu, 4 Nov 2021 17:05:38 GMT Subject: RFR: 8276036: The value of full_count in the message of insufficient codecache is wrong In-Reply-To: References: Message-ID: <8AfBiTLCRBg_MjDGPs5Gaaw2E1qX2bbsS1J-AApRXfs=.867099ec-6ce1-43e2-8171-41a957198045@github.com> On Wed, 27 Oct 2021 16:07:03 GMT, Dalibor Topic wrote: >> Could you please review the 8276036 bug fixes? >> >> This bug is caused by the wrong place to add the value of full_count. >> The initial value of full_count is 0, so it needs to be added before outputting the message. > > Hi, please contact me at dalibor.topic at oracle.com so that I can verify your account. @robilad robilad Thank you for your reply. I sent an email to you yesterday. Please check it later. ------------- PR: https://git.openjdk.java.net/jdk/pull/6129 From robilad at openjdk.java.net Thu Nov 4 17:05:37 2021 From: robilad at openjdk.java.net (Dalibor Topic) Date: Thu, 4 Nov 2021 17:05:37 GMT Subject: RFR: 8276036: The value of full_count in the message of insufficient codecache is wrong In-Reply-To: References: Message-ID: On Wed, 27 Oct 2021 02:35:29 GMT, Takuya Kiriyama wrote: > Could you please review the 8276036 bug fixes? > > This bug is caused by the wrong place to add the value of full_count. > The initial value of full_count is 0, so it needs to be added before outputting the message. Hi, please contact me at dalibor.topic at oracle.com so that I can verify your account. ------------- PR: https://git.openjdk.java.net/jdk/pull/6129 From sviswanathan at openjdk.java.net Thu Nov 4 18:01:33 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 4 Nov 2021 18:01:33 GMT Subject: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency Message-ID: This patch removes conflicts with libsvml.so distributed with Intel's MKL library: Renames exported symbols from __svml to __jsvml. Renames library from libsvml.so to libjsvml.so. Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and the renamed symbols. Updates tests to look for the new library. Please review. Best Regards, Sandhya ------------- Commit messages: - load svml test fixes - update tests - 8276025: Hotspot's libsvml.so may conflict with user dependency Changes: https://git.openjdk.java.net/jdk/pull/6265/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6265&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276025 Stats: 391989 lines in 123 files changed: 192755 ins; 192755 del; 6479 mod Patch: https://git.openjdk.java.net/jdk/pull/6265.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6265/head:pull/6265 PR: https://git.openjdk.java.net/jdk/pull/6265 From kvn at openjdk.java.net Thu Nov 4 18:15:11 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 4 Nov 2021 18:15:11 GMT Subject: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 17:48:56 GMT, Sandhya Viswanathan wrote: > This patch removes conflicts with libsvml.so distributed with Intel's MKL library: > Renames exported symbols from __svml to __jsvml. > Renames library from libsvml.so to libjsvml.so. > Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and the renamed symbols. > Updates tests to look for the new library. > > Please review. > > Best Regards, > Sandhya Looks good. I will run tests. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6265 From sviswanathan at openjdk.java.net Thu Nov 4 18:18:09 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 4 Nov 2021 18:18:09 GMT Subject: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 18:11:41 GMT, Vladimir Kozlov wrote: >> This patch removes conflicts with libsvml.so distributed with Intel's MKL library: >> Renames exported symbols from __svml to __jsvml. >> Renames library from libsvml.so to libjsvml.so. >> Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and the renamed symbols. >> Updates tests to look for the new library. >> >> Please review. >> >> Best Regards, >> Sandhya > > Looks good. I will run tests. Thanks a lot @vnkozlov. ------------- PR: https://git.openjdk.java.net/jdk/pull/6265 From kvn at openjdk.java.net Thu Nov 4 18:24:12 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 4 Nov 2021 18:24:12 GMT Subject: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 17:48:56 GMT, Sandhya Viswanathan wrote: > This patch removes conflicts with libsvml.so distributed with Intel's MKL library: > Renames exported symbols from __svml to __jsvml. > Renames library from libsvml.so to libjsvml.so. > Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and the renamed symbols. > Updates tests to look for the new library. > > Please review. > > Best Regards, > Sandhya For completeness may be rename files too: jsvml_*.S and jsvml_*.S.inc ------------- PR: https://git.openjdk.java.net/jdk/pull/6265 From sviswanathan at openjdk.java.net Thu Nov 4 19:49:08 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 4 Nov 2021 19:49:08 GMT Subject: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency [v2] In-Reply-To: References: Message-ID: > This patch removes conflicts with libsvml.so distributed with Intel's MKL library: > Renames exported symbols from __svml to __jsvml. > Renames library from libsvml.so to libjsvml.so. > Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and the renamed symbols. > Updates tests to look for the new library. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: change filename to jsvml* ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6265/files - new: https://git.openjdk.java.net/jdk/pull/6265/files/7c488c10..70d962ae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6265&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6265&range=00-01 Stats: 0 lines in 72 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6265.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6265/head:pull/6265 PR: https://git.openjdk.java.net/jdk/pull/6265 From dlong at openjdk.java.net Thu Nov 4 19:53:12 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 4 Nov 2021 19:53:12 GMT Subject: RFR: 8275670: ciReplay: java.lang.NoClassDefFoundError when trying to load java/lang/invoke/LambdaForm$MH In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 23:15:41 GMT, Dean Long wrote: > The invokedynamic support added by 8271911 was incomplete. In particular, it failed to search through argL fields. This change fixes that, adds a test, and also cleans up the code a bit. Thanks Christian. ------------- PR: https://git.openjdk.java.net/jdk/pull/6243 From dlong at openjdk.java.net Thu Nov 4 19:56:26 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 4 Nov 2021 19:56:26 GMT Subject: Integrated: 8275670: ciReplay: java.lang.NoClassDefFoundError when trying to load java/lang/invoke/LambdaForm$MH In-Reply-To: References: Message-ID: <6K90-W6vkmb5fOkyfSUyS58GsQfuWLQ85Y1Dx8AbEJA=.94c98b66-6d02-46e8-a5fe-889cc5d79a66@github.com> On Wed, 3 Nov 2021 23:15:41 GMT, Dean Long wrote: > The invokedynamic support added by 8271911 was incomplete. In particular, it failed to search through argL fields. This change fixes that, adds a test, and also cleans up the code a bit. This pull request has now been integrated. Changeset: dcf36f87 Author: Dean Long URL: https://git.openjdk.java.net/jdk/commit/dcf36f87f87d52490c1f0434c2fca99fc8fd90a2 Stats: 116 lines in 3 files changed: 90 ins; 1 del; 25 mod 8275670: ciReplay: java.lang.NoClassDefFoundError when trying to load java/lang/invoke/LambdaForm$MH Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/6243 From kvn at openjdk.java.net Thu Nov 4 20:08:12 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 4 Nov 2021 20:08:12 GMT Subject: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency [v2] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 19:49:08 GMT, Sandhya Viswanathan wrote: >> This patch removes conflicts with libsvml.so distributed with Intel's MKL library: >> Renames exported symbols from __svml to __jsvml. >> Renames library from libsvml.so to libjsvml.so. >> Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and the renamed symbols. >> Updates tests to look for the new library. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > change filename to jsvml* Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6265 From erikj at openjdk.java.net Thu Nov 4 20:13:10 2021 From: erikj at openjdk.java.net (Erik Joelsson) Date: Thu, 4 Nov 2021 20:13:10 GMT Subject: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency [v2] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 19:49:08 GMT, Sandhya Viswanathan wrote: >> This patch removes conflicts with libsvml.so distributed with Intel's MKL library: >> Renames exported symbols from __svml to __jsvml. >> Renames library from libsvml.so to libjsvml.so. >> Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and the renamed symbols. >> Updates tests to look for the new library. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > change filename to jsvml* Build changes look good. ------------- Marked as reviewed by erikj (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6265 From kvn at openjdk.java.net Thu Nov 4 20:48:12 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 4 Nov 2021 20:48:12 GMT Subject: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency [v2] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 19:49:08 GMT, Sandhya Viswanathan wrote: >> This patch removes conflicts with libsvml.so distributed with Intel's MKL library: >> Renames exported symbols from __svml to __jsvml. >> Renames library from libsvml.so to libjsvml.so. >> Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and the renamed symbols. >> Updates tests to look for the new library. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > change filename to jsvml* There is also need to update one closed test we have before integration. ------------- PR: https://git.openjdk.java.net/jdk/pull/6265 From psandoz at openjdk.java.net Thu Nov 4 21:05:10 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 4 Nov 2021 21:05:10 GMT Subject: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency [v2] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 19:49:08 GMT, Sandhya Viswanathan wrote: >> This patch removes conflicts with libsvml.so distributed with Intel's MKL library: >> Renames exported symbols from __svml to __jsvml. >> Renames library from libsvml.so to libjsvml.so. >> Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and the renamed symbols. >> Updates tests to look for the new library. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > change filename to jsvml* I did a case insensitive regex search `[^j]svml` over the checked out PR and did not find anything relevant that was missed. ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6265 From ihse at openjdk.java.net Thu Nov 4 22:07:11 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Thu, 4 Nov 2021 22:07:11 GMT Subject: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency [v2] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 19:49:08 GMT, Sandhya Viswanathan wrote: >> This patch removes conflicts with libsvml.so distributed with Intel's MKL library: >> Renames exported symbols from __svml to __jsvml. >> Renames library from libsvml.so to libjsvml.so. >> Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and the renamed symbols. >> Updates tests to look for the new library. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > change filename to jsvml* Marked as reviewed by ihse (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6265 From duke at openjdk.java.net Thu Nov 4 22:18:11 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Thu, 4 Nov 2021 22:18:11 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v2] In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: On Wed, 3 Nov 2021 06:53:31 GMT, Tobias Hartmann wrote: >> @TobiHartmann, The latest patch is the final version. Could you re-run testing again? Many thanks. > > @tatwaichong All tests passed. @TobiHartmann @dholmes-ora, This patch is more or less identical to the original patch (https://github.com/openjdk/jdk/pull/5129). Now, this has passed all tests mentioned above. Do these tests cover failures that appeared on the original patch (https://bugs.openjdk.java.net/browse/JDK-8275263)? If these failures have disappeared, it may be safe to submit? ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From kvn at openjdk.java.net Fri Nov 5 00:59:09 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 5 Nov 2021 00:59:09 GMT Subject: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency [v2] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 18:15:08 GMT, Sandhya Viswanathan wrote: >> Looks good. I will run tests. > > Thanks a lot @vnkozlov. @sviswa7 testing passed you can integrate. ------------- PR: https://git.openjdk.java.net/jdk/pull/6265 From njian at openjdk.java.net Fri Nov 5 02:34:09 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 5 Nov 2021 02:34:09 GMT Subject: RFR: 8276151: AArch64: Incorrect result for double to int vector conversion [v2] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 16:47:21 GMT, Paul Sandoz wrote: >> Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments from Andrew. > > Java changes look good. > > What are your thoughts about applying the changes to all `VectorCastShape*Test.java`? Since I presume this could also impact ARM SVE and other architectures. > > The duplication could be removed by placing the repeated code in a super class. This could be a follow on update if needed after integration of https://github.com/openjdk/jdk/pull/5873. Thank you @PaulSandoz for the review! > > What are your thoughts about applying the changes to all `VectorCastShape*Test.java`? Since I presume this could also impact ARM SVE and other architectures. > > The duplication could be removed by placing the repeated code in a super class. This could be a follow on update if needed after integration of #5873. Since SVE supports 128bits, so it could also test ARM SVE. I was also thinking about expanding the tests to other bits, but since we have JDK-8259610 recorded, it would be better to eventually fix existing vectorapi vector conversion tests there. ------------- PR: https://git.openjdk.java.net/jdk/pull/6247 From duke at openjdk.java.net Fri Nov 5 03:13:09 2021 From: duke at openjdk.java.net (Fei Gao) Date: Fri, 5 Nov 2021 03:13:09 GMT Subject: RFR: 8275317: AArch64: Support some type conversion vectorization in SLP In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 03:39:42 GMT, Fei Gao wrote: > Current SLP vectorizer in C2 compiler doesn't support type conversion > operations. But AArch64 has vector type conversion instructions in > both NEON and SVE. > > The type conversion involves two kinds of scenarios, conversion between > the same data sizes and conversion between different data sizes. If we > want to support casts between different data sizes, we need to amend > the code part for identifying adjacent memory references and the code > part for justifying if the combination is profitable. I suppose it > would be easier to review if we split the whole task to support type > conversion into two separate patches, one for the same data sizes and > the other one for different data sizes. The goal of this patch is just > to support conversions within the same data size, including: > int -> float > float -> int > long -> double > double -> long > > A typical test case: > > for (int i = start; i < limit; i++) { > b[i] = (float) a[i]; > } > > To implement it, the patch completed the necessary instructions and > matching rules in the backend and added implementation for SLP in > the middle end. > > The percentage of performance uplift on aarch64 system: > Mode: avgt > Cnt: 15 > Metric: (ns/op) > > benchmark percentage change [(After-Before)/Before] > VectorLoop.convertD2L -48.46% > VectorLoop.convertF2I -55.67% > VectorLoop.convertI2F -55.27% > VectorLoop.convertL2D -48.75% Hi, the patch mostly benefits AArch64, but the changes are all shared code. Could anyone help review it ? ------------- PR: https://git.openjdk.java.net/jdk/pull/6145 From sviswanathan at openjdk.java.net Fri Nov 5 03:34:18 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 5 Nov 2021 03:34:18 GMT Subject: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency [v2] In-Reply-To: References: Message-ID: <-gwQAH9K9_dyJZ6cjH_wbyvEP_VEyUVnXcq4cBjsqQQ=.00c317de-b82b-4db7-a44f-da0b71e4aae3@github.com> On Fri, 5 Nov 2021 00:56:05 GMT, Vladimir Kozlov wrote: >> Thanks a lot @vnkozlov. > > @sviswa7 testing passed you can integrate. Thanks a lot @vnkozlov for testing and review. Thanks @erikj79 @PaulSandoz @magicus for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/6265 From sviswanathan at openjdk.java.net Fri Nov 5 03:34:20 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 5 Nov 2021 03:34:20 GMT Subject: Integrated: 8276025: Hotspot's libsvml.so may conflict with user dependency In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 17:48:56 GMT, Sandhya Viswanathan wrote: > This patch removes conflicts with libsvml.so distributed with Intel's MKL library: > Renames exported symbols from __svml to __jsvml. > Renames library from libsvml.so to libjsvml.so. > Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and the renamed symbols. > Updates tests to look for the new library. > > Please review. > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: 9ad4d3d0 Author: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/9ad4d3d06bb356436d69af07726ef6727c500f59 Stats: 391989 lines in 123 files changed: 192755 ins; 192755 del; 6479 mod 8276025: Hotspot's libsvml.so may conflict with user dependency Reviewed-by: kvn, erikj, psandoz, ihse ------------- PR: https://git.openjdk.java.net/jdk/pull/6265 From rbackman at openjdk.java.net Fri Nov 5 06:27:26 2021 From: rbackman at openjdk.java.net (Rickard =?UTF-8?B?QsOkY2ttYW4=?=) Date: Fri, 5 Nov 2021 06:27:26 GMT Subject: RFR: 8268882: C2: assert(n->outcnt() != 0 || C->top() == n || n->is_Proj()) failed: No dead instructions after post-alloc Message-ID: Also delete Phi nodes with no uses. ------------- Commit messages: - 8268882: C2: assert(n->outcnt() != 0 || C->top() == n || n->is_Proj()) failed: No dead instructions after post-alloc Changes: https://git.openjdk.java.net/jdk/pull/6270/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6270&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8268882 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6270.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6270/head:pull/6270 PR: https://git.openjdk.java.net/jdk/pull/6270 From rbackman at openjdk.java.net Fri Nov 5 06:27:27 2021 From: rbackman at openjdk.java.net (Rickard =?UTF-8?B?QsOkY2ttYW4=?=) Date: Fri, 5 Nov 2021 06:27:27 GMT Subject: RFR: 8268882: C2: assert(n->outcnt() != 0 || C->top() == n || n->is_Proj()) failed: No dead instructions after post-alloc In-Reply-To: References: Message-ID: On Fri, 5 Nov 2021 06:19:25 GMT, Rickard B?ckman wrote: > Also delete Phi nodes with no uses. The cause of hitting this assert seems to be as follows: In regalloc the compiler decides to split a live range, generates a bunch of Phi nodes. Later after coalescing, the need to spill has gone away and Phi nodes are now starting to be deleted. Before this change this problematic Phi node is left in the graph because it has two inputs, two loadcon of the same value. However The regalloc only checks for reference equality when determining if a Phi node can be deleted due to a single unique input. Adding this check to also delete Phi nodes with no outs in this later stage makes sure these Phi nodes gets deleted as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/6270 From njian at openjdk.java.net Fri Nov 5 07:49:16 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 5 Nov 2021 07:49:16 GMT Subject: Integrated: 8276151: AArch64: Incorrect result for double to int vector conversion In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 07:19:55 GMT, Ningsheng Jian wrote: > Current NEON vector double to integer conversion generates code to convert double to long first and then narrow to integer, which does not follow Java language spec [1], and will get incorrect results for double values larger than Integer.MAX_VALUE or less than Integer.MIN_VALUE. For those too large/small values, result should be the largest/smallest representable value of type int, but converting to long and then narrowing to int will get different results. > > There's no direct double to int vector conversion NEON instruction, so we simply do it with scalar conversion. > > Also update compiler/vectorapi test cases to cover some corner cases. > > [1] https://docs.oracle.com/javase/specs/jls/se17/html/jls-5.html#jls-5.1.3 This pull request has now been integrated. Changeset: 96c396b7 Author: Ningsheng Jian URL: https://git.openjdk.java.net/jdk/commit/96c396b701e290fc3e1124b1c862b41e02e9c1d9 Stats: 171 lines in 4 files changed: 141 ins; 1 del; 29 mod 8276151: AArch64: Incorrect result for double to int vector conversion Reviewed-by: aph, psandoz ------------- PR: https://git.openjdk.java.net/jdk/pull/6247 From chagedorn at openjdk.java.net Fri Nov 5 08:30:12 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 5 Nov 2021 08:30:12 GMT Subject: RFR: 8276036: The value of full_count in the message of insufficient codecache is wrong In-Reply-To: <8AfBiTLCRBg_MjDGPs5Gaaw2E1qX2bbsS1J-AApRXfs=.867099ec-6ce1-43e2-8171-41a957198045@github.com> References: <8AfBiTLCRBg_MjDGPs5Gaaw2E1qX2bbsS1J-AApRXfs=.867099ec-6ce1-43e2-8171-41a957198045@github.com> Message-ID: <7I15YtAoKdAe2zolVfAHf_8KIYAF0jba2WB8NuUVH6s=.b950c2d9-0fb5-429e-bac4-f283b72156f2@github.com> On Fri, 29 Oct 2021 09:50:28 GMT, Takuya Kiriyama wrote: >> Hi, please contact me at dalibor.topic at oracle.com so that I can verify your account. > > @robilad robilad Thank you for your reply. > I sent an email to you yesterday. Please check it later. Hi @tkiriyama, this bug was already fixed and integrated by @tobiasholenstein a few days a ago: https://github.com/openjdk/jdk/pull/6185 https://github.com/openjdk/jdk/commit/61cb4bc6b0252536364a86f38ff2e5c8c7ab610b Since your account was only verified yesterday, an RFR email could not be sent out to the email list to inform people about your fix. Please make sure that you always assign JBS issues to you when you intend to work on them (the issue was assigned to @tobiasholenstein). If you do not have an account, reach out to someone with an account to reserve/sponsor it for you. This avoids duplicated work. Nevertheless, it is nice that you also found a test to verify your fix. I suggest to file a follow-up RFE to add your test for JDK-8276036 separately. Thanks, Christian ------------- PR: https://git.openjdk.java.net/jdk/pull/6129 From chagedorn at openjdk.java.net Fri Nov 5 08:30:13 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 5 Nov 2021 08:30:13 GMT Subject: RFR: 8276036: The value of full_count in the message of insufficient codecache is wrong In-Reply-To: References: Message-ID: On Wed, 27 Oct 2021 02:35:29 GMT, Takuya Kiriyama wrote: > Could you please review the 8276036 bug fixes? > > This bug is caused by the wrong place to add the value of full_count. > The initial value of full_count is 0, so it needs to be added before outputting the message. test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java line 63: > 61: > 62: OutputAnalyzer javaOutput = new OutputAnalyzer(javaProcess); > 63: String stdout = javaOutput.getStdout(); Use `ProcessTools.executeProcess(pb)` instead: OutputAnalyzer oa = ProcessTools.executeProcess(pb); oa.shouldHaveExitValue(0); String stdout = oa.getStdout(); ------------- PR: https://git.openjdk.java.net/jdk/pull/6129 From neliasso at openjdk.java.net Fri Nov 5 09:14:09 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 5 Nov 2021 09:14:09 GMT Subject: RFR: 8268882: C2: assert(n->outcnt() != 0 || C->top() == n || n->is_Proj()) failed: No dead instructions after post-alloc In-Reply-To: References: Message-ID: On Fri, 5 Nov 2021 06:19:25 GMT, Rickard B?ckman wrote: > Also delete Phi nodes with no uses. Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6270 From chagedorn at openjdk.java.net Fri Nov 5 09:25:12 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 5 Nov 2021 09:25:12 GMT Subject: RFR: 8268882: C2: assert(n->outcnt() != 0 || C->top() == n || n->is_Proj()) failed: No dead instructions after post-alloc In-Reply-To: References: Message-ID: On Fri, 5 Nov 2021 06:19:25 GMT, Rickard B?ckman wrote: > Also delete Phi nodes with no uses. Looks good to me, too! Do you also have a test case for it? ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6270 From chagedorn at openjdk.java.net Fri Nov 5 13:43:30 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 5 Nov 2021 13:43:30 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected Message-ID: In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. Thanks, Christian ------------- Commit messages: - C2: assert(no_dead_loop) failed: dead loop detected Changes: https://git.openjdk.java.net/jdk/pull/6276/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6276&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275326 Stats: 10 lines in 1 file changed: 0 ins; 7 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6276.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6276/head:pull/6276 PR: https://git.openjdk.java.net/jdk/pull/6276 From kvn at openjdk.java.net Fri Nov 5 16:12:10 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 5 Nov 2021 16:12:10 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: References: Message-ID: On Fri, 5 Nov 2021 13:00:00 GMT, Christian Hagedorn wrote: > In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 > ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) > > In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 > > During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 > > But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. > > I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. > > I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. > > I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. > > Thanks, > Christian Okay. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6276 From aph at openjdk.java.net Fri Nov 5 17:32:25 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 5 Nov 2021 17:32:25 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler Message-ID: The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, typedef RegisterImpl *Register; const Register r10 = ((Register)10); Registers have accessors, e.g.: ` int RegisterImpl::encoding() const { return (intptr_t)this; }` This works by an accident of implementation: it is not legal C++. The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) extern RegisterImpl all_Registers[num_Registers]; int RegisterImpl::encoding() const { return this - all_Registers; } After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: ` int RegisterImpl::encoding() const { return _encoding; }` This would result in smaller code, but I suspect slower. If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. ------------- Commit messages: - 8276563: Undefined Behaviour in class Assembler Changes: https://git.openjdk.java.net/jdk/pull/6280/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276563 Stats: 116 lines in 5 files changed: 67 ins; 22 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/6280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6280/head:pull/6280 PR: https://git.openjdk.java.net/jdk/pull/6280 From kvn at openjdk.java.net Fri Nov 5 18:04:32 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 5 Nov 2021 18:04:32 GMT Subject: RFR: 8268882: C2: assert(n->outcnt() != 0 || C->top() == n || n->is_Proj()) failed: No dead instructions after post-alloc In-Reply-To: References: Message-ID: On Fri, 5 Nov 2021 06:19:25 GMT, Rickard B?ckman wrote: > Also delete Phi nodes with no uses. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6270 From kvn at openjdk.java.net Fri Nov 5 18:09:36 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 5 Nov 2021 18:09:36 GMT Subject: RFR: 8275847: Scheduling fails with "too many D-U pinch points" on small method [v2] In-Reply-To: <0VRE1Xz5B5o9M0DjdTd5KBL5YOXPcp8Od5vCpH96j34=.a0c38e4e-5a02-4adc-be8f-22c579f53d47@github.com> References: <5z-HFwTvcqo_tge6dIMU4VZo-0UkInXAIKuh5D-fkxI=.b5f7a31b-717f-47b5-bfcc-90dbb223075e@github.com> <0VRE1Xz5B5o9M0DjdTd5KBL5YOXPcp8Od5vCpH96j34=.a0c38e4e-5a02-4adc-be8f-22c579f53d47@github.com> Message-ID: On Fri, 29 Oct 2021 07:44:47 GMT, Nick Gasson wrote: >> Since around JDK 16 the following method cannot be compiled by C2 on AArch64: >> >> >> public double mergeSync() { return Math.log(Math.sin(value)); } >> >> >> (Reduced from a slightly larger benchmark.) >> >> >> 811 416 ! 3 Test::mergeSync (61 bytes) >> 813 417 ! 4 Test::mergeSync (61 bytes) >> 816 417 ! 4 Test::mergeSync (61 bytes) COMPILE SKIPPED: too many D-U pinch points (retry at different tier) >> 816 418 ! 1 Test::mergeSync (61 bytes) >> >> >> Scheduling::anti_do_def() will create temporary Nodes for each OptoReg killed by the MachProjs from the two runtime leaf calls. After SVE support was added these runtime calls kill more registers, and the number of new Nodes added by anti_do_def exceeds an internal limit (which is based on the LRG map size and roughly proportional to the method size). >> >> X86 has the same problem if OptoScheduling is enabled because of the wide AVX registers. >> >> The fix here is to ignore OptoRegs which correspond to the high slots of wide vectors (i.e. slots above 64 bits). The scheduler doesn't run on methods where C->max_vector_size() > 8, so we know these kills can't affect the scheduling result. >> >> The added test fails on the current JDK with: >> >> >> compiler.lib.ir_framework.shared.TestRunException: Could not compile public double >> compiler.c2.irTests.TestScheduleSmallMethod.testSmallMethodTwoRuntimeCalls(double) at level C2 >> after 10s. Last compilation level: 3 > > Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: > > Remove dead uses of is_concrete Seems fine to me but I need to run testing before approval. ------------- PR: https://git.openjdk.java.net/jdk/pull/6131 From kvn at openjdk.java.net Fri Nov 5 20:15:49 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 5 Nov 2021 20:15:49 GMT Subject: RFR: 8275847: Scheduling fails with "too many D-U pinch points" on small method [v2] In-Reply-To: <0VRE1Xz5B5o9M0DjdTd5KBL5YOXPcp8Od5vCpH96j34=.a0c38e4e-5a02-4adc-be8f-22c579f53d47@github.com> References: <5z-HFwTvcqo_tge6dIMU4VZo-0UkInXAIKuh5D-fkxI=.b5f7a31b-717f-47b5-bfcc-90dbb223075e@github.com> <0VRE1Xz5B5o9M0DjdTd5KBL5YOXPcp8Od5vCpH96j34=.a0c38e4e-5a02-4adc-be8f-22c579f53d47@github.com> Message-ID: On Fri, 29 Oct 2021 07:44:47 GMT, Nick Gasson wrote: >> Since around JDK 16 the following method cannot be compiled by C2 on AArch64: >> >> >> public double mergeSync() { return Math.log(Math.sin(value)); } >> >> >> (Reduced from a slightly larger benchmark.) >> >> >> 811 416 ! 3 Test::mergeSync (61 bytes) >> 813 417 ! 4 Test::mergeSync (61 bytes) >> 816 417 ! 4 Test::mergeSync (61 bytes) COMPILE SKIPPED: too many D-U pinch points (retry at different tier) >> 816 418 ! 1 Test::mergeSync (61 bytes) >> >> >> Scheduling::anti_do_def() will create temporary Nodes for each OptoReg killed by the MachProjs from the two runtime leaf calls. After SVE support was added these runtime calls kill more registers, and the number of new Nodes added by anti_do_def exceeds an internal limit (which is based on the LRG map size and roughly proportional to the method size). >> >> X86 has the same problem if OptoScheduling is enabled because of the wide AVX registers. >> >> The fix here is to ignore OptoRegs which correspond to the high slots of wide vectors (i.e. slots above 64 bits). The scheduler doesn't run on methods where C->max_vector_size() > 8, so we know these kills can't affect the scheduling result. >> >> The added test fails on the current JDK with: >> >> >> compiler.lib.ir_framework.shared.TestRunException: Could not compile public double >> compiler.c2.irTests.TestScheduleSmallMethod.testSmallMethodTwoRuntimeCalls(double) at level C2 >> after 10s. Last compilation level: 3 > > Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: > > Remove dead uses of is_concrete Testing results look good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6131 From duke at openjdk.java.net Fri Nov 5 20:31:39 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Fri, 5 Nov 2021 20:31:39 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Fri, 5 Nov 2021 17:20:14 GMT, Andrew Haley wrote: > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. src/hotspot/cpu/aarch64/register_aarch64.hpp line 39: > 37: #define REGISTER_IMPL_DECLARATION(type, name) \ > 38: inline const type as_ ## type(int encoding) { \ > 39: assert(encoding <= name::number_of_declared_registers, "invalid register"); \ Should this be `<` instead of `<=`? src/hotspot/cpu/aarch64/register_aarch64.hpp line 76: > 74: > 75: // derived registers, offsets, and addresses > 76: Register successor() const { return as_Register(encoding() + 1); } Can we simply return `this + 1` here? Thank you very much. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From kvn at openjdk.java.net Fri Nov 5 22:25:42 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 5 Nov 2021 22:25:42 GMT Subject: RFR: 8274328: C2: Redundant CFG edges fixup in block ordering [v2] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 07:40:40 GMT, Yi Yang wrote: >> I think Trace::fixup_blocks is redundant because PhaseCFG::fixup_flow will nevertheless fix up the CFG flow(i.e. flip successor blocks of IfNode) right after PhaseBlockLayout pass, we can remove this step when doing PhaseBlockLayout pass.(Testing: jtreg/compiler/c2, presubmit test) >> >> https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/compile.cpp#L2765 >> >> https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/block.cpp#L1679 >> >> https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/block.cpp#L908-L916 > > Yi Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - use swap(ref,ref) > - Merge branch 'master' into blockordering > - 8274328: C2: Redundant CFG edges fixup in block ordering Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5705 From kvn at openjdk.java.net Fri Nov 5 22:25:43 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 5 Nov 2021 22:25:43 GMT Subject: RFR: 8274328: C2: Redundant CFG edges fixup in block ordering [v2] In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 07:44:09 GMT, Tobias Hartmann wrote: > That looks good to me but I'm not an expert in that code. I submitted some testing and it all passed. @TobiHartmann please add links to testing in RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/5705 From stuefe at openjdk.java.net Sat Nov 6 07:32:38 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 6 Nov 2021 07:32:38 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: <10jJ8k1MS8x86ooWkC7Iy-WkbkM8oM_Z5b5wAqm1g3M=.de897574-d42d-4f2f-a8de-f1c6648bf706@github.com> On Fri, 5 Nov 2021 17:20:14 GMT, Andrew Haley wrote: > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. Just an idea, but could you make the register number a constant template argument? struct Register { virtual int encoding() const = 0; }; template struct RegisterImpl : public Register { int encoding() const override { return regnum; } }; static const RegisterImpl<0> r0; static const RegisterImpl<1> r1; Compiled with gcc using -O3 on x64 it seems to use the constants directly as long as both Register and RegisterImpl are fully visible: https://gist.github.com/tstuefe/cd98163226b6cbaef44ea75d97c732b1. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Sat Nov 6 09:09:38 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 6 Nov 2021 09:09:38 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: <10jJ8k1MS8x86ooWkC7Iy-WkbkM8oM_Z5b5wAqm1g3M=.de897574-d42d-4f2f-a8de-f1c6648bf706@github.com> References: <10jJ8k1MS8x86ooWkC7Iy-WkbkM8oM_Z5b5wAqm1g3M=.de897574-d42d-4f2f-a8de-f1c6648bf706@github.com> Message-ID: <-3fY9GoewccvnXpO5QoUjvYTgTxz2fYtUNKralDNq1c=.eca82d89-1d45-4f9a-94e0-6202de6c9d71@github.com> On Sat, 6 Nov 2021 07:29:12 GMT, Thomas Stuefe wrote: > Just an idea, but could you make the register number a constant template argument? OK, I'll kick that around. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Sat Nov 6 09:09:39 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 6 Nov 2021 09:09:39 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Fri, 5 Nov 2021 20:27:02 GMT, Mai ??ng Qu?n Anh wrote: >> The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. >> The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, >> >> >> typedef RegisterImpl *Register; >> const Register r10 = ((Register)10); >> >> >> Registers have accessors, e.g.: >> >> ` int RegisterImpl::encoding() const { return (intptr_t)this; }` >> >> This works by an accident of implementation: it is not legal C++. >> >> The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) >> >> >> extern RegisterImpl all_Registers[num_Registers]; >> int RegisterImpl::encoding() const { return this - all_Registers; } >> >> >> After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. >> >> An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: >> >> ` int RegisterImpl::encoding() const { return _encoding; }` >> >> This would result in smaller code, but I suspect slower. >> >> If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. > > src/hotspot/cpu/aarch64/register_aarch64.hpp line 39: > >> 37: #define REGISTER_IMPL_DECLARATION(type, name) \ >> 38: inline const type as_ ## type(int encoding) { \ >> 39: assert(encoding <= name::number_of_declared_registers, "invalid register"); \ > > Should this be `<` instead of `<=`? No, because `noreg` is a pointer one past the end of the array, > src/hotspot/cpu/aarch64/register_aarch64.hpp line 76: > >> 74: >> 75: // derived registers, offsets, and addresses >> 76: Register successor() const { return as_Register(encoding() + 1); } > > Can we simply return `this + 1` here? > Thank you very much. Oh yeah, good point. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From jvernee at openjdk.java.net Sat Nov 6 16:19:37 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Sat, 6 Nov 2021 16:19:37 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Fri, 5 Nov 2021 17:20:14 GMT, Andrew Haley wrote: > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. It's not clear to me why `Register` is implemented as a pointer in the first place, instead of a class with a single `int` or `intptr_t` field for the encoding. There is a comment about that in register.hpp, but it doesn't offer an explanation: https://github.com/openjdk/jdk/blob/2653cfbf0f316183ea23dd234896b44f7dd6eae0/src/hotspot/share/asm/register.hpp#L37-L41 I wonder if changing the implementation to define `AbstractRegister` as: class AbstractRegister { int _value; protected: int value() { return _value; } }; have a class `Register` that extend from `AbstractRegister`, and remove all the `typedef RegisterImpl* Register` and similar, wouldn't be a viable solution as well? The implementations of `encoding()` could simply call `value()` after doing a validity check (they already do in most cases). To save us from having to change `->` to `.` everywhere, an overloaded `operator->` could be added to each implementation class that just returns `this` ([Godbolt](https://godbolt.org/z/Tajhc6s83)). ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From stuefe at openjdk.java.net Sat Nov 6 17:10:33 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 6 Nov 2021 17:10:33 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: <-3fY9GoewccvnXpO5QoUjvYTgTxz2fYtUNKralDNq1c=.eca82d89-1d45-4f9a-94e0-6202de6c9d71@github.com> References: <10jJ8k1MS8x86ooWkC7Iy-WkbkM8oM_Z5b5wAqm1g3M=.de897574-d42d-4f2f-a8de-f1c6648bf706@github.com> <-3fY9GoewccvnXpO5QoUjvYTgTxz2fYtUNKralDNq1c=.eca82d89-1d45-4f9a-94e0-6202de6c9d71@github.com> Message-ID: On Sat, 6 Nov 2021 09:04:07 GMT, Andrew Haley wrote: > > Just an idea, but could you make the register number a constant template argument? > > OK, I'll kick that around. Hmm, I realize my proposal works well if a specific register is used, but not if it is hidden behind a generic `Register*` pointer. In the former case, the compiler inlines the constant into the code. In the latter case, it calls `virtual RegisterImpl::encoding()`, so you get one vtable access and one subroutine call. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From stuefe at openjdk.java.net Sat Nov 6 17:18:31 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 6 Nov 2021 17:18:31 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Sat, 6 Nov 2021 16:15:44 GMT, Jorn Vernee wrote: > It's not clear to me why `Register` is implemented as a pointer in the first place, instead of a class with a single `int` or `intptr_t` field for the encoding. There is a comment about that in register.hpp, but it doesn't offer an explanation: I believe the point is to encode the register number in the `this` pointer. If it were a member, you'd have to dereference the pointer to access the member value, so one more load instruction. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From jvernee at openjdk.java.net Sat Nov 6 17:42:33 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Sat, 6 Nov 2021 17:42:33 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Sat, 6 Nov 2021 17:15:15 GMT, Thomas Stuefe wrote: > > It's not clear to me why `Register` is implemented as a pointer in the first place, instead of a class with a single `int` or `intptr_t` field for the encoding. There is a comment about that in register.hpp, but it doesn't offer an explanation: > > I believe the point is to encode the register number in the `this` pointer. If it were a member, you'd have to dereference the pointer to access the member value, so one more load instruction. Yes, if we would keep using pointers we'd get a dereference, but I'm saying: let's get rid of the pointers and use straight up values. No dereferences. AFAIU compilers would treat this the same as the current scheme. Or, perhaps even better: I've seen compiler constant folding be perturbed in the past by integer <-> pointer casts (I guess it just bails out in some cases), and also, using an `int` would halve the footprint of arrays of registers on platforms where pointers are 64-bits. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From jvernee at openjdk.java.net Sat Nov 6 17:52:39 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Sat, 6 Nov 2021 17:52:39 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Fri, 5 Nov 2021 17:20:14 GMT, Andrew Haley wrote: > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. In other words: in the current scheme we pass around an integer disguised as a pointer, and then have to cast it to an integer to use it. That seems silly. Let's just pass around the integer instead, wrapped in an value-object, which the compiler should treat the same as if it were just the integer. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From stuefe at openjdk.java.net Sun Nov 7 06:52:40 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sun, 7 Nov 2021 06:52:40 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Sat, 6 Nov 2021 17:38:49 GMT, Jorn Vernee wrote: > Or, perhaps even better: I've seen compiler constant folding be perturbed in the past by integer <-> pointer casts (I guess it just bails out in some cases), and also, using an `int` would halve the footprint of arrays of registers on platforms where pointers are 64-bits. Now I get it. I like your proposal, it's way simpler. I did a small test on x64: https://gist.github.com/tstuefe/c97dff7624a1469a7295dda51ed9a265 As you say, compiler passes encoding value directly, using - in my example - just the 16-bit portion of RDI since I used a short. The only odd thing was that when using a constant object know at the use site (case B), I would have expected an immediate, but the constant encoding gets loaded from the text segment instead. Wrt `->` to `.`, I dislike operator overloading so I would just do the changes instead. But that's up to Andrew. Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From duke at openjdk.java.net Sun Nov 7 07:02:34 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Sun, 7 Nov 2021 07:02:34 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Sun, 7 Nov 2021 06:49:18 GMT, Thomas Stuefe wrote: > The only odd thing was that when using a constant object know at the use site (case B), I would have expected an immediate, but the constant encoding gets loaded from the text segment instead. I believe to achieve constant folding here you should mark the constructor of the `Register` class as `constexpr`, and maybe the `rxxx` variable too just to make sure. Cheers. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From duke at openjdk.java.net Sun Nov 7 07:43:33 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Sun, 7 Nov 2021 07:43:33 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Fri, 5 Nov 2021 17:20:14 GMT, Andrew Haley wrote: > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. Hi, I have investigated Jorn proposal and found out that in order to achieve the desired performance we will need to rely on the compiler to inline the methods of the `Register` class themselves, and the failure to do so would lead to the caller materialising the object on the stack to retrieve the `this` pointer, as well as the callee needing to pop the object from the stack to process. ([Godbolt](https://godbolt.org/z/o4v4es4qj)). Cheers. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Sun Nov 7 11:44:31 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sun, 7 Nov 2021 11:44:31 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: <10jJ8k1MS8x86ooWkC7Iy-WkbkM8oM_Z5b5wAqm1g3M=.de897574-d42d-4f2f-a8de-f1c6648bf706@github.com> <-3fY9GoewccvnXpO5QoUjvYTgTxz2fYtUNKralDNq1c=.eca82d89-1d45-4f9a-94e0-6202de6c9d71@github.com> Message-ID: On Sat, 6 Nov 2021 17:07:59 GMT, Thomas Stuefe wrote: > > > Just an idea, but could you make the register number a constant template argument? > > > > > > OK, I'll kick that around. > > Hmm, I realize my proposal works well if a specific register is used, but not if it is hidden behind a generic `Register*` pointer. In the former case, the compiler inlines the constant into the code. In the latter case, it calls `virtual RegisterImpl::encoding()`, so you get one vtable access and one subroutine call. Ah. Oh well. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Sun Nov 7 11:49:34 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sun, 7 Nov 2021 11:49:34 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Sat, 6 Nov 2021 17:38:49 GMT, Jorn Vernee wrote: > I believe the point is to encode the register number in the `this` pointer. If it were a member, you'd have to dereference the pointer to access the member value, so one more load instruction. That's exactly the point. We have two alternatives, either a subtraction or a load, and it's hard to do much better than that. However, there is one idea, but it's rather hacky: 64-align the array of `Register`s then use an AND operation to get the encoding. The trouble with that is that it's not really portable, just portable-ish. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Sun Nov 7 11:52:37 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sun, 7 Nov 2021 11:52:37 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Sat, 6 Nov 2021 17:49:37 GMT, Jorn Vernee wrote: > In other words: in the current scheme we pass around an integer disguised as a pointer, and then have to cast it to an integer to use it. That seems silly. Let's just pass around the integer instead, wrapped in a value-object, which the compiler should treat the same as if it were just the integer. That's worth exploring, but I'm reluctant to start overloading `->`. I'll have a look. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From jvernee at openjdk.java.net Sun Nov 7 13:12:36 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Sun, 7 Nov 2021 13:12:36 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Sun, 7 Nov 2021 06:59:25 GMT, Mai ??ng Qu?n Anh wrote: > The only odd thing was that when using a constant object know at the use site (case B), I would have expected an immediate, but the constant encoding gets loaded from the text segment instead. Yeah... I found that the constructor has to be marked `constexpr` to make it work with GCC (as @merykitty also found). The value also gets emitted into `.rodata` in that case. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From yyang at openjdk.java.net Mon Nov 8 02:22:39 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 8 Nov 2021 02:22:39 GMT Subject: RFR: 8274328: C2: Redundant CFG edges fixup in block ordering [v2] In-Reply-To: References: Message-ID: <2laiwqTiAGwYIEZVbmkIdunVaF4HyZ_vuLttotT9mD4=.c7f553d0-943c-4b63-b03b-1301514c4c2a@github.com> On Mon, 1 Nov 2021 07:59:27 GMT, Tobias Hartmann wrote: >> Yi Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - use swap(ref,ref) >> - Merge branch 'master' into blockordering >> - 8274328: C2: Redundant CFG edges fixup in block ordering > > Thanks, looks good. Thanks @TobiHartmann and @vnkozlov for reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5705 From yyang at openjdk.java.net Mon Nov 8 02:22:40 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 8 Nov 2021 02:22:40 GMT Subject: Integrated: 8274328: C2: Redundant CFG edges fixup in block ordering In-Reply-To: References: Message-ID: On Sun, 26 Sep 2021 10:40:43 GMT, Yi Yang wrote: > I think Trace::fixup_blocks is redundant because PhaseCFG::fixup_flow will nevertheless fix up the CFG flow(i.e. flip successor blocks of IfNode) right after PhaseBlockLayout pass, we can remove this step when doing PhaseBlockLayout pass.(Testing: jtreg/compiler/c2, presubmit test) > > https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/compile.cpp#L2765 > > https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/block.cpp#L1679 > > https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/block.cpp#L908-L916 This pull request has now been integrated. Changeset: 44047f84 Author: Yi Yang URL: https://git.openjdk.java.net/jdk/commit/44047f849fad157dac5df788aa5a2c1838e4aaf7 Stats: 56 lines in 2 files changed: 6 ins; 45 del; 5 mod 8274328: C2: Redundant CFG edges fixup in block ordering Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5705 From yyang at openjdk.java.net Mon Nov 8 02:42:35 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 8 Nov 2021 02:42:35 GMT Subject: RFR: 8271202: C1: assert(false) failed: live_in set of first block must be empty In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 06:57:26 GMT, Yi Yang wrote: > Hi, I'm trying to fix [JDK-8271202](https://bugs.openjdk.java.net/browse/JDK-8271202). A local variable(smallinvoc) is defined in B3 and only used in B14, so it oughts to have a short lifetime. But its lifetime has been unconditionally extended since -XX:+DeoptimizeALot(**Just removing this may be also a simpler and safer fix? Not sure if it's acceptable**), making it propagate to almost the whole remaing IR. > > https://github.com/openjdk/jdk/blob/ecd445562f8355704a041f9eca0e87dc85a7f44c/src/hotspot/share/ci/ciMethod.cpp#L373-L379 > > ![image](https://user-images.githubusercontent.com/5010047/127277954-2a64d87e-2981-4d74-8001-c7efeb000a10.png) > > > A virtual register(v603) that represents this variable is located in B13 live_in set, which propagated to B1 live_out set. > > When B1 merges state with B16 and B19, it found that this variable in new_state(B16) was empty, so B1 invalidates the corresponding local slot. > > https://github.com/openjdk/jdk/blob/ecd445562f8355704a041f9eca0e87dc85a7f44c/src/hotspot/share/c1/c1_Instruction.cpp#L826-L838 > > I think we should invalidate this slot only when their types are mismatched. Otherwise, Phi will not be generated, B19 live_gen set will not contain this variable, because of which this variable is alive in B1 live_in. B1 live_in will eventually backward propagate to B20 live_in set, it avoids being killed by B19 live_gen, which causes the crash. > > > Block 1 > live_in: > 603 616 617 618 619 620 621 622 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > 620 > live_kill: > 648 649 650 > > Block 16 > live_in: > 603 616 617 618 619 620 621 622 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > 616 617 618 619 620 621 622 > live_kill: > 620 654 655 656 657 > > Block 19 > live_in: > 603 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > > live_kill: > 0 1 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 > > > Block 20 > live_in: > 603 > live_out: > 603 > live_gen: > > live_kill: > 577 578 Thank you all for reviewing this PR, I will investigate more based on review feedback, will get back to you on this. ------------- PR: https://git.openjdk.java.net/jdk/pull/4916 From ngasson at openjdk.java.net Mon Nov 8 06:41:36 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 8 Nov 2021 06:41:36 GMT Subject: RFR: 8275847: Scheduling fails with "too many D-U pinch points" on small method [v2] In-Reply-To: <0VRE1Xz5B5o9M0DjdTd5KBL5YOXPcp8Od5vCpH96j34=.a0c38e4e-5a02-4adc-be8f-22c579f53d47@github.com> References: <5z-HFwTvcqo_tge6dIMU4VZo-0UkInXAIKuh5D-fkxI=.b5f7a31b-717f-47b5-bfcc-90dbb223075e@github.com> <0VRE1Xz5B5o9M0DjdTd5KBL5YOXPcp8Od5vCpH96j34=.a0c38e4e-5a02-4adc-be8f-22c579f53d47@github.com> Message-ID: On Fri, 29 Oct 2021 07:44:47 GMT, Nick Gasson wrote: >> Since around JDK 16 the following method cannot be compiled by C2 on AArch64: >> >> >> public double mergeSync() { return Math.log(Math.sin(value)); } >> >> >> (Reduced from a slightly larger benchmark.) >> >> >> 811 416 ! 3 Test::mergeSync (61 bytes) >> 813 417 ! 4 Test::mergeSync (61 bytes) >> 816 417 ! 4 Test::mergeSync (61 bytes) COMPILE SKIPPED: too many D-U pinch points (retry at different tier) >> 816 418 ! 1 Test::mergeSync (61 bytes) >> >> >> Scheduling::anti_do_def() will create temporary Nodes for each OptoReg killed by the MachProjs from the two runtime leaf calls. After SVE support was added these runtime calls kill more registers, and the number of new Nodes added by anti_do_def exceeds an internal limit (which is based on the LRG map size and roughly proportional to the method size). >> >> X86 has the same problem if OptoScheduling is enabled because of the wide AVX registers. >> >> The fix here is to ignore OptoRegs which correspond to the high slots of wide vectors (i.e. slots above 64 bits). The scheduler doesn't run on methods where C->max_vector_size() > 8, so we know these kills can't affect the scheduling result. >> >> The added test fails on the current JDK with: >> >> >> compiler.lib.ir_framework.shared.TestRunException: Could not compile public double >> compiler.c2.irTests.TestScheduleSmallMethod.testSmallMethodTwoRuntimeCalls(double) at level C2 >> after 10s. Last compilation level: 3 > > Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: > > Remove dead uses of is_concrete Thanks for the reviews. The Windows x64 build failure looks to be spurious, I re-ran the GitHub actions here: https://github.com/nick-arm/jdk/runs/4133872978 ------------- PR: https://git.openjdk.java.net/jdk/pull/6131 From ngasson at openjdk.java.net Mon Nov 8 06:43:48 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 8 Nov 2021 06:43:48 GMT Subject: Integrated: 8275847: Scheduling fails with "too many D-U pinch points" on small method In-Reply-To: <5z-HFwTvcqo_tge6dIMU4VZo-0UkInXAIKuh5D-fkxI=.b5f7a31b-717f-47b5-bfcc-90dbb223075e@github.com> References: <5z-HFwTvcqo_tge6dIMU4VZo-0UkInXAIKuh5D-fkxI=.b5f7a31b-717f-47b5-bfcc-90dbb223075e@github.com> Message-ID: On Wed, 27 Oct 2021 06:25:22 GMT, Nick Gasson wrote: > Since around JDK 16 the following method cannot be compiled by C2 on AArch64: > > > public double mergeSync() { return Math.log(Math.sin(value)); } > > > (Reduced from a slightly larger benchmark.) > > > 811 416 ! 3 Test::mergeSync (61 bytes) > 813 417 ! 4 Test::mergeSync (61 bytes) > 816 417 ! 4 Test::mergeSync (61 bytes) COMPILE SKIPPED: too many D-U pinch points (retry at different tier) > 816 418 ! 1 Test::mergeSync (61 bytes) > > > Scheduling::anti_do_def() will create temporary Nodes for each OptoReg killed by the MachProjs from the two runtime leaf calls. After SVE support was added these runtime calls kill more registers, and the number of new Nodes added by anti_do_def exceeds an internal limit (which is based on the LRG map size and roughly proportional to the method size). > > X86 has the same problem if OptoScheduling is enabled because of the wide AVX registers. > > The fix here is to ignore OptoRegs which correspond to the high slots of wide vectors (i.e. slots above 64 bits). The scheduler doesn't run on methods where C->max_vector_size() > 8, so we know these kills can't affect the scheduling result. > > The added test fails on the current JDK with: > > > compiler.lib.ir_framework.shared.TestRunException: Could not compile public double > compiler.c2.irTests.TestScheduleSmallMethod.testSmallMethodTwoRuntimeCalls(double) at level C2 > after 10s. Last compilation level: 3 This pull request has now been integrated. Changeset: 3934fe54 Author: Nick Gasson URL: https://git.openjdk.java.net/jdk/commit/3934fe54b4c3e51add6d3fe1f145e5aebfe3b2fc Stats: 89 lines in 4 files changed: 74 ins; 10 del; 5 mod 8275847: Scheduling fails with "too many D-U pinch points" on small method Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6131 From thartmann at openjdk.java.net Mon Nov 8 07:51:43 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 8 Nov 2021 07:51:43 GMT Subject: RFR: 8274328: C2: Redundant CFG edges fixup in block ordering [v2] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 07:59:27 GMT, Tobias Hartmann wrote: >> Yi Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - use swap(ref,ref) >> - Merge branch 'master' into blockordering >> - 8274328: C2: Redundant CFG edges fixup in block ordering > > Thanks, looks good. > @TobiHartmann please add links to testing in RFE. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/5705 From thartmann at openjdk.java.net Mon Nov 8 07:57:46 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 8 Nov 2021 07:57:46 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: On Tue, 2 Nov 2021 00:16:44 GMT, TatWai Chong wrote: >> After JDK-8269559 was integrated there are failures in tier1 testing >> across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. >> >> This patch is NOT functional; rather, this tends to verify potential >> toolchain issues as the original patch pass testing on other >> platforms. >> >> In this patch, we remove new SVE-related matching rules and register >> class introduced in the original patch to minimally affect the >> non-SVE part. > > TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: > > Add the matching rule in td file, enable control path in the code stub. Yes, I executed the same testing (and much more) that triggered the failures with your original test. But I'll re-run just to make sure. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From duke at openjdk.java.net Mon Nov 8 08:18:33 2021 From: duke at openjdk.java.net (Takuya Kiriyama) Date: Mon, 8 Nov 2021 08:18:33 GMT Subject: RFR: 8276036: The value of full_count in the message of insufficient codecache is wrong In-Reply-To: <7I15YtAoKdAe2zolVfAHf_8KIYAF0jba2WB8NuUVH6s=.b950c2d9-0fb5-429e-bac4-f283b72156f2@github.com> References: <8AfBiTLCRBg_MjDGPs5Gaaw2E1qX2bbsS1J-AApRXfs=.867099ec-6ce1-43e2-8171-41a957198045@github.com> <7I15YtAoKdAe2zolVfAHf_8KIYAF0jba2WB8NuUVH6s=.b950c2d9-0fb5-429e-bac4-f283b72156f2@github.com> Message-ID: On Fri, 5 Nov 2021 08:26:47 GMT, Christian Hagedorn wrote: >> @robilad robilad Thank you for your reply. >> I sent an email to you yesterday. Please check it later. > > Hi @tkiriyama, this bug was already fixed and integrated by @tobiasholenstein a few days a ago: > https://github.com/openjdk/jdk/pull/6185 > https://github.com/openjdk/jdk/commit/61cb4bc6b0252536364a86f38ff2e5c8c7ab610b > > Since your account was only verified yesterday, an RFR email could not be sent out to the email list to inform people about your fix. Please make sure that you always assign JBS issues to you when you intend to work on them (the issue was assigned to @tobiasholenstein). If you do not have an account, reach out to someone with an account to reserve/sponsor it for you. This avoids duplicated work. > > Nevertheless, it is nice that you also found a test to verify your fix. I suggest to file a follow-up RFE to add your test for JDK-8276036 separately. > > Thanks, > Christian Hi, @chhagedorn , thank you for your valuable comments. I will try that. ------------- PR: https://git.openjdk.java.net/jdk/pull/6129 From thartmann at openjdk.java.net Mon Nov 8 08:38:41 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 8 Nov 2021 08:38:41 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: References: Message-ID: <0fckxfADj9rXHXaOGFIMOFXWY5-xVArobIn5RomTTq0=.76980c82-3c8e-41b8-be27-13e3516331ad@github.com> On Fri, 5 Nov 2021 13:00:00 GMT, Christian Hagedorn wrote: > In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 > ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) > > In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 > > During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 > > But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. > > I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. > > I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. > > I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. > > Thanks, > Christian Looks good to me. Just wondering, if we run this during GVN, could it happen that the phis are never transformed because they are not registered for IGVN? Please add a `noreg-*` label to the bug. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6276 From chagedorn at openjdk.java.net Mon Nov 8 09:18:14 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 8 Nov 2021 09:18:14 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: References: Message-ID: > In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 > ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) > > In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 > > During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 > > But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. > > I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. > > I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. > > I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: handle GVN ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6276/files - new: https://git.openjdk.java.net/jdk/pull/6276/files/2fde1b7f..4de74cf7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6276&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6276&range=00-01 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6276.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6276/head:pull/6276 PR: https://git.openjdk.java.net/jdk/pull/6276 From chagedorn at openjdk.java.net Mon Nov 8 09:18:15 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 8 Nov 2021 09:18:15 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: <0fckxfADj9rXHXaOGFIMOFXWY5-xVArobIn5RomTTq0=.76980c82-3c8e-41b8-be27-13e3516331ad@github.com> References: <0fckxfADj9rXHXaOGFIMOFXWY5-xVArobIn5RomTTq0=.76980c82-3c8e-41b8-be27-13e3516331ad@github.com> Message-ID: On Mon, 8 Nov 2021 08:35:24 GMT, Tobias Hartmann wrote: > Looks good to me. Just wondering, if we run this during GVN, could it happen that the phis are never transformed because they are not registered for IGVN? Thanks Tobias for your review. I think you're right - we could potentially miss a transformation when this happens during GVN. I've pushed an update by using `phase->record_for_igvn()` in the GVN case. > Please add a `noreg-*` label to the bug. Done, thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/6276 From thartmann at openjdk.java.net Mon Nov 8 11:28:32 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 8 Nov 2021 11:28:32 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: References: Message-ID: On Mon, 8 Nov 2021 09:18:14 GMT, Christian Hagedorn wrote: >> In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 >> ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) >> >> In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 >> >> During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 >> >> But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. >> >> I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. >> >> I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. >> >> I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > handle GVN Thanks, that looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6276 From chagedorn at openjdk.java.net Mon Nov 8 11:33:38 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 8 Nov 2021 11:33:38 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: References: Message-ID: On Mon, 8 Nov 2021 09:18:14 GMT, Christian Hagedorn wrote: >> In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 >> ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) >> >> In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 >> >> During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 >> >> But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. >> >> I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. >> >> I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. >> >> I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > handle GVN Thanks Tobias! ------------- PR: https://git.openjdk.java.net/jdk/pull/6276 From thartmann at openjdk.java.net Mon Nov 8 12:29:34 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 8 Nov 2021 12:29:34 GMT Subject: RFR: 8271056: C2: "assert(no_dead_loop) failed: dead loop detected" due to cmoving identity In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 13:02:11 GMT, Christian Hagedorn wrote: > In the testcase, an unsafe cmoving identity is applied in `PhiNode::Identity()` after parsing which replaces a loop phi in a dead loop creating a dead data loop which triggers the assertion. The problem is that `PhiNode::Identity()` assumes that a cmoving identity is always safe because `PhiNode::Ideal()` handles unsafe cases and only leaves safe cases to `PhiNode::Identity()`: > https://github.com/openjdk/jdk/blob/4c3491bfa5f296b80c56a37cb4fffd6497323ac2/src/hotspot/share/opto/cfgnode.cpp#L2051-L2055 > > However, the fix for [JDK-8268883 ](https://github.com/openjdk/jdk17/commit/6d8fc7249a3a1a2350c462f9c4fe38377856392f)added the following additional condition to wait for the region to be processed: > https://github.com/openjdk/jdk/blob/4c3491bfa5f296b80c56a37cb4fffd6497323ac2/src/hotspot/share/opto/cfgnode.cpp#L2047-L2053 > > This skips the process of an unsafe case in `PhiNode::Ideal()` in the testcase. Afterwards, the unsafe case is replaced unconditionally in `PhiNode::Identity()` resulting in a dead data loop. > > I therefore propose to add the same check added in JDK-8268883 to `PhiNode::Identity()` to prevent that. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6172 From chagedorn at openjdk.java.net Mon Nov 8 12:51:43 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 8 Nov 2021 12:51:43 GMT Subject: RFR: 8271056: C2: "assert(no_dead_loop) failed: dead loop detected" due to cmoving identity In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 13:02:11 GMT, Christian Hagedorn wrote: > In the testcase, an unsafe cmoving identity is applied in `PhiNode::Identity()` after parsing which replaces a loop phi in a dead loop creating a dead data loop which triggers the assertion. The problem is that `PhiNode::Identity()` assumes that a cmoving identity is always safe because `PhiNode::Ideal()` handles unsafe cases and only leaves safe cases to `PhiNode::Identity()`: > https://github.com/openjdk/jdk/blob/4c3491bfa5f296b80c56a37cb4fffd6497323ac2/src/hotspot/share/opto/cfgnode.cpp#L2051-L2055 > > However, the fix for [JDK-8268883 ](https://github.com/openjdk/jdk17/commit/6d8fc7249a3a1a2350c462f9c4fe38377856392f)added the following additional condition to wait for the region to be processed: > https://github.com/openjdk/jdk/blob/4c3491bfa5f296b80c56a37cb4fffd6497323ac2/src/hotspot/share/opto/cfgnode.cpp#L2047-L2053 > > This skips the process of an unsafe case in `PhiNode::Ideal()` in the testcase. Afterwards, the unsafe case is replaced unconditionally in `PhiNode::Identity()` resulting in a dead data loop. > > I therefore propose to add the same check added in JDK-8268883 to `PhiNode::Identity()` to prevent that. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6172 From chagedorn at openjdk.java.net Mon Nov 8 12:51:45 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 8 Nov 2021 12:51:45 GMT Subject: Integrated: 8271056: C2: "assert(no_dead_loop) failed: dead loop detected" due to cmoving identity In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 13:02:11 GMT, Christian Hagedorn wrote: > In the testcase, an unsafe cmoving identity is applied in `PhiNode::Identity()` after parsing which replaces a loop phi in a dead loop creating a dead data loop which triggers the assertion. The problem is that `PhiNode::Identity()` assumes that a cmoving identity is always safe because `PhiNode::Ideal()` handles unsafe cases and only leaves safe cases to `PhiNode::Identity()`: > https://github.com/openjdk/jdk/blob/4c3491bfa5f296b80c56a37cb4fffd6497323ac2/src/hotspot/share/opto/cfgnode.cpp#L2051-L2055 > > However, the fix for [JDK-8268883 ](https://github.com/openjdk/jdk17/commit/6d8fc7249a3a1a2350c462f9c4fe38377856392f)added the following additional condition to wait for the region to be processed: > https://github.com/openjdk/jdk/blob/4c3491bfa5f296b80c56a37cb4fffd6497323ac2/src/hotspot/share/opto/cfgnode.cpp#L2047-L2053 > > This skips the process of an unsafe case in `PhiNode::Ideal()` in the testcase. Afterwards, the unsafe case is replaced unconditionally in `PhiNode::Identity()` resulting in a dead data loop. > > I therefore propose to add the same check added in JDK-8268883 to `PhiNode::Identity()` to prevent that. > > Thanks, > Christian This pull request has now been integrated. Changeset: 54481394 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/54481394a3b7d36b2326e22e4aa910a3e8041b5c Stats: 87 lines in 2 files changed: 85 ins; 0 del; 2 mod 8271056: C2: "assert(no_dead_loop) failed: dead loop detected" due to cmoving identity Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6172 From aph at openjdk.java.net Mon Nov 8 14:21:37 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 8 Nov 2021 14:21:37 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Sun, 7 Nov 2021 11:49:36 GMT, Andrew Haley wrote: > > In other words: in the current scheme we pass around an integer disguised as a pointer, and then have to cast it to an integer to use it. That seems silly. Let's just pass around the integer instead, wrapped in a value-object, which the compiler should treat the same as if it were just the integer. > > That's worth exploring, but I'm reluctant to start overloading `->`. I'll have a look. So I had a good look, and I think this idea is not going to fly without substantial changes elsewhere. The trickiest problem I've found so far is `GrowableArray`, which assumes that `Register` is a pointer type. None of these problems is insurmountable, of course, but it does mean that a change like this spills into many more places in shared code. Oh, the other problem is that variables of type `Register` may be assigned, which means that the field containing the encoding can no longer be `const`, and there are `const` problems elsewhere too. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From jvernee at openjdk.java.net Mon Nov 8 15:02:44 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 8 Nov 2021 15:02:44 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Mon, 8 Nov 2021 14:17:15 GMT, Andrew Haley wrote: > > > In other words: in the current scheme we pass around an integer disguised as a pointer, and then have to cast it to an integer to use it. That seems silly. Let's just pass around the integer instead, wrapped in a value-object, which the compiler should treat the same as if it were just the integer. > > > > > > That's worth exploring, but I'm reluctant to start overloading `->`. I'll have a look. > > So I had a good look, and I think this idea is not going to fly without substantial changes elsewhere. The trickiest problem I've found so far is `GrowableArray`, which assumes that `Register` is a pointer type. None of these problems is insurmountable, of course, but it does mean that a change like this spills into many more places in shared code. I ran into problems before where GrowableArray::print was assuming the values it held were pointer sized. Is that what you're talking about as well? I think a long term fix there might be to implement some kind of `print_on` template function with specializations for different types, which GrowableArray then uses instead of trying to cast each element to an `intptr_t`. > Oh, the other problem is that variables of type `Register` may be assigned, which means that the field containing the encoding can no longer be `const`, and there are `const` problems elsewhere too. Okay, I'd assume not having the encoding be `const` would be fine, as long as `Register` is passed around by-value. i.e. only some code's private copy of a `Register` instance would be overwritten (same as if it was a pointer). --- Any ways, thanks for trying this out! If the required changes are too much (making backporting harder as well) I think going with a more focused fix is the way to go, as well. Maybe we can come back for a broader refactoring another day. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Mon Nov 8 16:41:40 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 8 Nov 2021 16:41:40 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Mon, 8 Nov 2021 14:59:18 GMT, Jorn Vernee wrote: > Okay, I'd assume not having the encoding be `const` would be fine, as long as `Register` is passed around by-value. i.e. only some code's private copy of a `Register` instance would be overwritten (same as if it was a pointer). > > Any ways, thanks for trying this out! If the required changes are too much (making backporting harder as well) I think going with a more focused fix is the way to go, as well. Maybe we can come back for a broader refactoring another day. Sorry, my last comment was a bit misleading. I should not have posted so soon. I have something that works now with a bunch of overloads and `constexpr` references, etc., but it's all a bit too much. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From phedlin at openjdk.java.net Mon Nov 8 16:45:51 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Mon, 8 Nov 2021 16:45:51 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend Message-ID: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). Contributed by Nick Gasson. Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. ------------- Commit messages: - Replacing worst case assumption with proper(?) type size calc. - 8276108: Wrong instruction generation in aarch64 backend Changes: https://git.openjdk.java.net/jdk/pull/6212/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6212&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276108 Stats: 30 lines in 3 files changed: 13 ins; 1 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/6212.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6212/head:pull/6212 PR: https://git.openjdk.java.net/jdk/pull/6212 From phedlin at openjdk.java.net Mon Nov 8 16:45:51 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Mon, 8 Nov 2021 16:45:51 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend In-Reply-To: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: <7t8vNmxySP5b9lfHjSoaSXMj-AnAISWkHXEk-PNoIJQ=.0d15faf4-7d8d-44f7-9a41-f324eaf078c7@github.com> On Tue, 2 Nov 2021 14:02:48 GMT, Patric Hedlin wrote: > C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). > > Contributed by Nick Gasson. > > Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). > > Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. Testing tier1-5 src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 482: > 480: "must be, was: %ld, %d", _offset, size); > 481: unsigned mask = (1 << size) - 1; > 482: if (_offset < 0 || _offset & mask) { Prefer(?): (_offset & mask) != 0 ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From aph at openjdk.java.net Mon Nov 8 18:25:35 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 8 Nov 2021 18:25:35 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend In-Reply-To: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: On Tue, 2 Nov 2021 14:02:48 GMT, Patric Hedlin wrote: > C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). > > Contributed by Nick Gasson. > > Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). > > Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. We surely need a reproducer for this one. ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From aph at openjdk.java.net Mon Nov 8 18:25:36 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 8 Nov 2021 18:25:36 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend In-Reply-To: <7t8vNmxySP5b9lfHjSoaSXMj-AnAISWkHXEk-PNoIJQ=.0d15faf4-7d8d-44f7-9a41-f324eaf078c7@github.com> References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> <7t8vNmxySP5b9lfHjSoaSXMj-AnAISWkHXEk-PNoIJQ=.0d15faf4-7d8d-44f7-9a41-f324eaf078c7@github.com> Message-ID: On Mon, 8 Nov 2021 16:36:49 GMT, Patric Hedlin wrote: >> C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). >> >> Contributed by Nick Gasson. >> >> Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). >> >> Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 482: > >> 480: "must be, was: %ld, %d", _offset, size); >> 481: unsigned mask = (1 << size) - 1; >> 482: if (_offset < 0 || _offset & mask) { > > Prefer(?): (_offset & mask) != 0 This diff is baffling. What changed? ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From manc at openjdk.java.net Mon Nov 8 18:40:38 2021 From: manc at openjdk.java.net (Man Cao) Date: Mon, 8 Nov 2021 18:40:38 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v5] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 20:01:37 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 and arm builds Friendly ping. Could anyone take a look? The bug has been bumped to P2. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From mdoerr at openjdk.java.net Mon Nov 8 21:17:40 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 8 Nov 2021 21:17:40 GMT Subject: RFR: 8271202: C1: assert(false) failed: live_in set of first block must be empty In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 06:57:26 GMT, Yi Yang wrote: > Hi, I'm trying to fix [JDK-8271202](https://bugs.openjdk.java.net/browse/JDK-8271202). A local variable(smallinvoc) is defined in B3 and only used in B14, so it oughts to have a short lifetime. But its lifetime has been unconditionally extended since -XX:+DeoptimizeALot(**Just removing this may be also a simpler and safer fix? Not sure if it's acceptable**), making it propagate to almost the whole remaing IR. > > https://github.com/openjdk/jdk/blob/ecd445562f8355704a041f9eca0e87dc85a7f44c/src/hotspot/share/ci/ciMethod.cpp#L373-L379 > > ![image](https://user-images.githubusercontent.com/5010047/127277954-2a64d87e-2981-4d74-8001-c7efeb000a10.png) > > > A virtual register(v603) that represents this variable is located in B13 live_in set, which propagated to B1 live_out set. > > When B1 merges state with B16 and B19, it found that this variable in new_state(B16) was empty, so B1 invalidates the corresponding local slot. > > https://github.com/openjdk/jdk/blob/ecd445562f8355704a041f9eca0e87dc85a7f44c/src/hotspot/share/c1/c1_Instruction.cpp#L826-L838 > > I think we should invalidate this slot only when their types are mismatched. Otherwise, Phi will not be generated, B19 live_gen set will not contain this variable, because of which this variable is alive in B1 live_in. B1 live_in will eventually backward propagate to B20 live_in set, it avoids being killed by B19 live_gen, which causes the crash. > > > Block 1 > live_in: > 603 616 617 618 619 620 621 622 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > 620 > live_kill: > 648 649 650 > > Block 16 > live_in: > 603 616 617 618 619 620 621 622 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > 616 617 618 619 620 621 622 > live_kill: > 620 654 655 656 657 > > Block 19 > live_in: > 603 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > > live_kill: > 0 1 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 > > > Block 20 > live_in: > 603 > live_out: > 603 > live_gen: > > live_kill: > 577 578 Thanks a lot for analyzing it further. Christian has added a comment to the bug how it can be reproduced, but I haven't found time for it, yet. My first guess is that the problem could be that we don't invalidate phi functions recursively. The phi we're currently invalidating may feed into another phi in another block which would also require invalidation. If that's the case, we could either implement recursive invalidation or detect it and bail out. ------------- PR: https://git.openjdk.java.net/jdk/pull/4916 From dlong at openjdk.java.net Mon Nov 8 23:47:33 2021 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 8 Nov 2021 23:47:33 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v5] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 20:01:37 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 and arm builds This looks like a similar problem to JDK-8276563. It would be nice if both problems had the same solution, minimize unnecessary renames, and does not significantly impact performance. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From manc at openjdk.java.net Tue Nov 9 00:05:34 2021 From: manc at openjdk.java.net (Man Cao) Date: Tue, 9 Nov 2021 00:05:34 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v5] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 20:01:37 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 and arm builds Thanks for the feedback. For JDK-8276563, the operation on "this" seems limited to <, >, +, - operators, which are well defined. However, for this PR, the biggest problem is the "&" operator on "this", in: bool check_value_mask(intptr_t mask, intptr_t masked_value) const { return (value() & mask) == masked_value; } Compiler can make various optimizations if it believes "this" pointer is aligned. PS for other reviewers: Looking at the change in "src/hotspot/share/c1/c1_LIR.hpp" is the quickest way to get a high-level idea of this change. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From duke at openjdk.java.net Tue Nov 9 02:05:01 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Tue, 9 Nov 2021 02:05:01 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v2] In-Reply-To: References: Message-ID: > This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. > > In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. > > Thank you very much. Mai ??ng Qu?n Anh has updated the pull request incrementally with one additional commit since the last revision: fix copyright declaration ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6101/files - new: https://git.openjdk.java.net/jdk/pull/6101/files/426abe8c..69842dfe Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6101&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6101&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6101.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6101/head:pull/6101 PR: https://git.openjdk.java.net/jdk/pull/6101 From yyang at openjdk.java.net Tue Nov 9 02:06:43 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 9 Nov 2021 02:06:43 GMT Subject: Withdrawn: 8271202: C1: assert(false) failed: live_in set of first block must be empty In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 06:57:26 GMT, Yi Yang wrote: > Hi, I'm trying to fix [JDK-8271202](https://bugs.openjdk.java.net/browse/JDK-8271202). A local variable(smallinvoc) is defined in B3 and only used in B14, so it oughts to have a short lifetime. But its lifetime has been unconditionally extended since -XX:+DeoptimizeALot(**Just removing this may be also a simpler and safer fix? Not sure if it's acceptable**), making it propagate to almost the whole remaing IR. > > https://github.com/openjdk/jdk/blob/ecd445562f8355704a041f9eca0e87dc85a7f44c/src/hotspot/share/ci/ciMethod.cpp#L373-L379 > > ![image](https://user-images.githubusercontent.com/5010047/127277954-2a64d87e-2981-4d74-8001-c7efeb000a10.png) > > > A virtual register(v603) that represents this variable is located in B13 live_in set, which propagated to B1 live_out set. > > When B1 merges state with B16 and B19, it found that this variable in new_state(B16) was empty, so B1 invalidates the corresponding local slot. > > https://github.com/openjdk/jdk/blob/ecd445562f8355704a041f9eca0e87dc85a7f44c/src/hotspot/share/c1/c1_Instruction.cpp#L826-L838 > > I think we should invalidate this slot only when their types are mismatched. Otherwise, Phi will not be generated, B19 live_gen set will not contain this variable, because of which this variable is alive in B1 live_in. B1 live_in will eventually backward propagate to B20 live_in set, it avoids being killed by B19 live_gen, which causes the crash. > > > Block 1 > live_in: > 603 616 617 618 619 620 621 622 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > 620 > live_kill: > 648 649 650 > > Block 16 > live_in: > 603 616 617 618 619 620 621 622 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > 616 617 618 619 620 621 622 > live_kill: > 620 654 655 656 657 > > Block 19 > live_in: > 603 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > > live_kill: > 0 1 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 > > > Block 20 > live_in: > 603 > live_out: > 603 > live_gen: > > live_kill: > 577 578 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4916 From stuefe at openjdk.java.net Tue Nov 9 06:03:36 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 9 Nov 2021 06:03:36 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Mon, 8 Nov 2021 16:38:30 GMT, Andrew Haley wrote: > > Okay, I'd assume not having the encoding be `const` would be fine, as long as `Register` is passed around by-value. i.e. only some code's private copy of a `Register` instance would be overwritten (same as if it was a pointer). > > Any ways, thanks for trying this out! If the required changes are too much (making backporting harder as well) I think going with a more focused fix is the way to go, as well. Maybe we can come back for a broader refactoring another day. > > Sorry, my last comment was a bit misleading. I should not have posted so soon. I have something that works now with a bunch of overloads and `constexpr` references, etc., but it's all a bit too much. I thought some more about this, and so far like your first approach (using an array, and maybe not subtracting but ANDing) best. Its a pity that piggybacking on "this" is UB since it is exactly what one wants here. I know this is probably not what you want, but I just mention it: how about making Register a plain type (e.g. int)? We could replace calls to `r->encoding` and `r->value()` with just `r`. Then, to add functionality and to group it nicely, make todays Register subclasses utility wrappers which wrap around such an Register. You'd have to create those things on the fly when you need them, but at least the compiler would optimze them away. I know that would mean a ton of changes. But it would be exactly what you want to express, would not be UB, would be portable and easy to understand. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From phedlin at openjdk.java.net Tue Nov 9 08:41:35 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 9 Nov 2021 08:41:35 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> <7t8vNmxySP5b9lfHjSoaSXMj-AnAISWkHXEk-PNoIJQ=.0d15faf4-7d8d-44f7-9a41-f324eaf078c7@github.com> Message-ID: On Mon, 8 Nov 2021 18:22:52 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 482: >> >>> 480: "must be, was: %ld, %d", _offset, size); >>> 481: unsigned mask = (1 << size) - 1; >>> 482: if (_offset < 0 || _offset & mask) { >> >> Prefer(?): (_offset & mask) != 0 > > This diff is baffling. What changed? Hardly baffling. The section was changed to conform with style and indentation in the rest of the function. ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From aph at openjdk.java.net Tue Nov 9 09:42:36 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 9 Nov 2021 09:42:36 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> <7t8vNmxySP5b9lfHjSoaSXMj-AnAISWkHXEk-PNoIJQ=.0d15faf4-7d8d-44f7-9a41-f324eaf078c7@github.com> Message-ID: On Tue, 9 Nov 2021 08:38:45 GMT, Patric Hedlin wrote: >> This diff is baffling. What changed? > > Hardly baffling. The section was changed to conform with style and indentation in the rest of the function. OK, so there is no material change, it's just a cleanup. ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From phedlin at openjdk.java.net Tue Nov 9 09:55:36 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 9 Nov 2021 09:55:36 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> <7t8vNmxySP5b9lfHjSoaSXMj-AnAISWkHXEk-PNoIJQ=.0d15faf4-7d8d-44f7-9a41-f324eaf078c7@github.com> Message-ID: <9GAQACs8Znf9sFMvkvzw7PK4KjxIXwveYQ-wIGCcuVM=.049eaa11-e21a-44f6-a4a6-a6830c710d79@github.com> On Tue, 9 Nov 2021 09:38:11 GMT, Andrew Haley wrote: >> Hardly baffling. The section was changed to conform with style and indentation in the rest of the function. > > OK, so there is no material change, it's just a cleanup. Right, but lsbs/mask check could be made more explicit, as in offset_ok_for_immed(). For conformance(?). ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From aph at openjdk.java.net Tue Nov 9 11:41:34 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 9 Nov 2021 11:41:34 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Mon, 8 Nov 2021 14:59:18 GMT, Jorn Vernee wrote: > > In other words: in the current scheme we pass around an integer disguised as a pointer, and then have to cast it to an integer to use it. That seems silly. Let's just pass around the integer instead, wrapped in a value-object, which the compiler should treat the same as if it were just the integer. > > That's worth exploring, but I'm reluctant to start overloading `->`. I'll have a look. So I implemented this in a proof-of-concept way, and unfortunately GCC does a lot of forcing `Register` instances into memory, even though it's a value-only class with no virtual members. libjvm size: Before patch: 18667808 Using array in memory: 18864416 101.05% Pass by value: 18930040 101.40% This was definitely worthy of investigation, and I am grateful for the suggestion, but I don't think it's worth pursuing any further. The important thing is to get rid of the UB, and I intend to concentrate on cleaning up Plan A. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Tue Nov 9 11:51:33 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 9 Nov 2021 11:51:33 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 06:00:43 GMT, Thomas Stuefe wrote: > I know this is probably not what you want, but I just mention it: how about making Register a plain type (e.g. int)? We could replace calls to `r->encoding` and `r->value()` with just `r`. Then, to add functionality and to group it nicely, make todays Register subclasses utility wrappers which wrap around such an Register. You'd have to create those things on the fly when you need them, but at least the compiler would optimze them away. > > I know that would mean a ton of changes. But it would be exactly what you want to express, would not be UB, would be portable and easy to understand. Oh totally, but it's a pretty extreme case of spec inflation. My goal right now is to make the UB go away without causing extra work for anyone else. And I have some other ideas that would improve efficiency, probably more beneficial than this, I want to get on with. (I hope - more later.) ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From shade at openjdk.java.net Tue Nov 9 12:02:47 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 9 Nov 2021 12:02:47 GMT Subject: RFR: 8276846: JDK-8273416 is incomplete for UseSSE=1 Message-ID: When doing the fix for [JDK-8273416](https://bugs.openjdk.java.net/browse/JDK-8273416), I made a mistake of not running with `UseSSE=1`. That one highlights a bug (see JIRA for reproducer): I have misjudged that `float` and `double` args are handled at the same SSE levels, which they are not. `doubles` uses FPU with SSE={0,1}, and `float` uses FPU with SSE={0}. See for example the blurbs in `MacroAssembler::{load|store}_{float|double}`. So the new predicate for `regFPR` argument mishandles UseSSE=1 case: we should not allow `castFF_PR` to match then. Additional testing: - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=0 -XX:UseSSE=1` - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=0 -XX:UseSSE=2` - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=0 -XX:UseSSE=3` - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=0 -XX:UseSSE=4` - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=1 -XX:UseSSE=4` - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=2 -XX:UseSSE=4` - [ ] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` - [ ] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=0 -XX:UseSSE=1` - [ ] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=0 -XX:UseSSE=2` - [ ] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=0 -XX:UseSSE=3` - [ ] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=0 -XX:UseSSE=4` - [ ] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=1 -XX:UseSSE=4` - [ ] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=2 -XX:UseSSE=4` - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=0 -XX:UseSSE=1` - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=0 -XX:UseSSE=2` - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=0 -XX:UseSSE=3` - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=0 -XX:UseSSE=4` - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=1 -XX:UseSSE=4` - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=2 -XX:UseSSE=4` ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/6308/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6308&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276846 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6308.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6308/head:pull/6308 PR: https://git.openjdk.java.net/jdk/pull/6308 From neliasso at openjdk.java.net Tue Nov 9 13:27:36 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 9 Nov 2021 13:27:36 GMT Subject: RFR: 8276846: JDK-8273416 is incomplete for UseSSE=1 In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 11:52:50 GMT, Aleksey Shipilev wrote: > When doing the fix for [JDK-8273416](https://bugs.openjdk.java.net/browse/JDK-8273416), I made a mistake of not running with `UseSSE=1`. That one highlights a bug (see JIRA for reproducer): I have misjudged that `float` and `double` args are handled at the same SSE levels, which they are not. `doubles` uses FPU with SSE={0,1}, and `float` uses FPU with SSE={0}. See for example the blurbs in `MacroAssembler::{load|store}_{float|double}`. So the new predicate for `regFPR` argument mishandles UseSSE=1 case: we should not allow `castFF_PR` to match then. > > Additional testing: > - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` > - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=0 -XX:UseSSE=1` > - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=0 -XX:UseSSE=2` > - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=0 -XX:UseSSE=3` > - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=0 -XX:UseSSE=4` > - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=1 -XX:UseSSE=4` > - [x] Linux x86_32 `hotspot:tier1` with `-XX:UseAVX=2 -XX:UseSSE=4` > - [x] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` > - [x] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=0 -XX:UseSSE=1` > - [x] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=0 -XX:UseSSE=2` > - [x] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=0 -XX:UseSSE=3` > - [x] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=0 -XX:UseSSE=4` > - [x] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=1 -XX:UseSSE=4` > - [x] Linux x86_32 `jdk:tier1` with `-XX:UseAVX=2 -XX:UseSSE=4` > - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` > - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=0 -XX:UseSSE=1` > - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=0 -XX:UseSSE=2` > - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=0 -XX:UseSSE=3` > - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=0 -XX:UseSSE=4` > - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=1 -XX:UseSSE=4` > - [ ] Linux x86_32 `langtools:tier1` with `-XX:UseAVX=2 -XX:UseSSE=4` Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6308 From chagedorn at openjdk.java.net Tue Nov 9 13:29:58 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 9 Nov 2021 13:29:58 GMT Subject: RFR: 8276546: [IR Framework] Whitelist and ignore CompileThreshold Message-ID: This patch whitelists `CompileThreshold` and ignores it if passed as JTreg VM/Java option flag to a test. The reason to do this is that our CI executes `-XX:-TieredCompilation` in combination with `CompileThreshold` and therefore IR matching will not be performed (because `CompileThreshold` is not whitelisted). This patch changes this. Setting `CompileThreshold` with `TestFramework::addFlags/runWithFlags()` will normally apply the flag. Thanks, Christian ------------- Commit messages: - 8276546: [IR Framework] Whitelist and ignore CompileThreshold Changes: https://git.openjdk.java.net/jdk/pull/6312/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6312&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276546 Stats: 109 lines in 3 files changed: 106 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6312.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6312/head:pull/6312 PR: https://git.openjdk.java.net/jdk/pull/6312 From jvernee at openjdk.java.net Tue Nov 9 13:34:39 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 9 Nov 2021 13:34:39 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler In-Reply-To: References: Message-ID: On Fri, 5 Nov 2021 17:20:14 GMT, Andrew Haley wrote: > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. LGTM. I noticed this is only fixed on AArch64, but AFAICS we could have the same problem on x86. src/hotspot/cpu/aarch64/register_aarch64.hpp line 44: > 42: extern name all_ ## type ## s[name::number_of_declared_registers] INTERNAL_VISIBILITY; \ > 43: constexpr type first_ ## type = all_ ## type ## s; \ > 44: inline constexpr type name::first() { return all_ ## type ## s; } Same here: Suggestion: // Macros to help define all kinds of registers #define REGISTER_IMPL_DECLARATION(type, impl_type) \ inline const type as_ ## type(int encoding) { \ assert(encoding <= impl_type::number_of_declared_registers, "invalid register"); \ return encoding == -1 ? impl_type::invalid() : impl_type::first() + encoding; \ } \ extern impl_type all_ ## type ## s[impl_type::number_of_declared_registers] INTERNAL_VISIBILITY; \ constexpr type first_ ## type = all_ ## type ## s; \ inline constexpr type impl_type::first() { return all_ ## type ## s; } src/hotspot/cpu/aarch64/register_aarch64.hpp line 47: > 45: > 46: #define REGISTER_IMPL_DEFINITION(type, name) \ > 47: name all_ ## type ## s[name::number_of_declared_registers]; The use of the macro parameters `type` and `name` here, is a bit confusing since they mean something else in the `CONSTANT_REGISTER_DECLARATION` macro below. I'd suggest changing the parameter names to `type` and `impl_type` instead, to reflect that they are `` and `Impl` Suggestion: #define REGISTER_IMPL_DEFINITION(type, impl_type) \ impl_type all_ ## type ## s[impl_type::number_of_declared_registers]; src/hotspot/cpu/aarch64/register_aarch64.hpp line 179: > 177: // accessors > 178: bool is_valid() const { return this < invalid(); } > 179: bool has_byte_register() const { return is_valid(); } Is `has_byte_register` needed here? I see it in the previous code for `Register` but not for `FloatRegister`. Suggestion: src/hotspot/cpu/aarch64/register_aarch64.hpp line 286: > 284: // accessors > 285: bool is_valid() const { return this < invalid(); } > 286: bool has_byte_register() const { return is_valid(); } Same here, seems to be added but not used. Suggestion: ------------- Marked as reviewed by jvernee (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6280 From kvn at openjdk.java.net Tue Nov 9 17:32:33 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 9 Nov 2021 17:32:33 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: References: Message-ID: On Mon, 8 Nov 2021 09:18:14 GMT, Christian Hagedorn wrote: >> In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 >> ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) >> >> In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 >> >> During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 >> >> But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. >> >> I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. >> >> I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. >> >> I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > handle GVN Update is good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6276 From kvn at openjdk.java.net Tue Nov 9 18:24:40 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 9 Nov 2021 18:24:40 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v5] In-Reply-To: References: Message-ID: <7acpNQm2KgMpVMXkJR8-xcQHaVEhHBreos40RF9egVI=.27594add-9e53-4c9f-b61d-e8003808da08@github.com> On Wed, 3 Nov 2021 20:01:37 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 and arm builds Changes looks fine for me (it is easy to see if exclude LIR_OprDesc:: -> LIR_Opr::). It used the same approach as 8229258. It is not very performance critical since it is C1 code. But I would like to see some numbers how it affect C1 compilation time (may be using -XX:+LogCompilation). ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6221 From kvn at openjdk.java.net Tue Nov 9 19:16:34 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 9 Nov 2021 19:16:34 GMT Subject: RFR: 8276846: JDK-8273416 is incomplete for UseSSE=1 In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 11:52:50 GMT, Aleksey Shipilev wrote: > When doing the fix for [JDK-8273416](https://bugs.openjdk.java.net/browse/JDK-8273416), I made a mistake of not running with `UseSSE=1`. That one highlights a bug (see JIRA for reproducer): I have misjudged that `float` and `double` args are handled at the same SSE levels, which they are not. `doubles` uses FPU with SSE={0,1}, and `float` uses FPU with SSE={0}. See for example the blurbs in `MacroAssembler::{load|store}_{float|double}`. So the new predicate for `regFPR` argument mishandles UseSSE=1 case: we should not allow `castFF_PR` to match then. > > Additional testing: > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=1` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=2` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=3` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=4` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=1 -XX:UseSSE=4` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=2 -XX:UseSSE=4` Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6308 From kvn at openjdk.java.net Tue Nov 9 20:05:42 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 9 Nov 2021 20:05:42 GMT Subject: RFR: 8276546: [IR Framework] Whitelist and ignore CompileThreshold In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 13:21:56 GMT, Christian Hagedorn wrote: > This patch whitelists `CompileThreshold` and ignores it if passed as JTreg VM/Java option flag to a test. The reason to do this is that our CI executes `-XX:-TieredCompilation` in combination with `CompileThreshold` and therefore IR matching will not be performed (because `CompileThreshold` is not whitelisted). This patch changes this. > > Setting `CompileThreshold` with `TestFramework::addFlags/runWithFlags()` will normally apply the flag. > > Thanks, > Christian Oaky. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6312 From neliasso at openjdk.java.net Tue Nov 9 20:20:10 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 9 Nov 2021 20:20:10 GMT Subject: RFR: 8273277: C2: Move conditional negation into rc_predicate [v3] In-Reply-To: References: Message-ID: > Hi, > > I need some feedback on this patch. This was reported from Tencent and found in internal testing about the same time. This patch is based on a a patch provided by Tencent. > > In some very specific circumstances we need to negate the range checks that we create in PhaseIdealLoop::loop_predication_impl_helper. This is done in three places, but that method also calls insert_initial_skeleton_predicate where this isn't taken into account. > > To simplify things I have moved the negation logic into rc_predicate. This should prevent us from missing this check again. > > I do have a concern that negating the condition of the rangecheck in the skeleton predicate, since the skeleton predicate will be used as a clone template, and some rangechecks optimizations seems to assume that range checks always have LT as the condidtion. On the other hand - it is a really uncommon scenario since we haven't failed here before. > > Feedback appreciated. > > Best regards, > Nils Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: Add test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5987/files - new: https://git.openjdk.java.net/jdk/pull/5987/files/21697bac..97e46ba9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5987&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5987&range=01-02 Stats: 66 lines in 1 file changed: 66 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5987.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5987/head:pull/5987 PR: https://git.openjdk.java.net/jdk/pull/5987 From neliasso at openjdk.java.net Tue Nov 9 20:30:35 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 9 Nov 2021 20:30:35 GMT Subject: RFR: 8273277: C2: Move conditional negation into rc_predicate [v3] In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 20:20:10 GMT, Nils Eliasson wrote: >> Hi, >> >> I need some feedback on this patch. This was reported from Tencent and found in internal testing about the same time. This patch is based on a a patch provided by Tencent. >> >> In some very specific circumstances we need to negate the range checks that we create in PhaseIdealLoop::loop_predication_impl_helper. This is done in three places, but that method also calls insert_initial_skeleton_predicate where this isn't taken into account. >> >> To simplify things I have moved the negation logic into rc_predicate. This should prevent us from missing this check again. >> >> I do have a concern that negating the condition of the rangecheck in the skeleton predicate, since the skeleton predicate will be used as a clone template, and some rangechecks optimizations seems to assume that range checks always have LT as the condidtion. On the other hand - it is a really uncommon scenario since we haven't failed here before. >> >> Feedback appreciated. >> >> Best regards, >> Nils > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Add test > > I do have a concern that negating the condition of the rangecheck in the skeleton predicate, since the skeleton predicate will be used as a clone template, and some rangechecks optimizations seems to assume that range checks always have LT as the condition. On the other hand - it is a really uncommon scenario since we haven't failed here before. > > Is there any specific code that you worry about? I have seen checks for LT, but I see none that specifically would be affecting this change. > > I think it should be fine because the purpose of copying and instantiating skeleton range check predicates is to guarantee that control/data paths die consistently when the main loop induction variable falls outside of the allowed range of an array access. But @rwestrel and @chhagedorn looked more into this recently. > > Can we add the test that Tencent found as well? Yes - I've added it. > > Please update the affects versions in the bug report. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/5987 From rbackman at openjdk.java.net Tue Nov 9 21:41:41 2021 From: rbackman at openjdk.java.net (Rickard =?UTF-8?B?QsOkY2ttYW4=?=) Date: Tue, 9 Nov 2021 21:41:41 GMT Subject: Integrated: 8268882: C2: assert(n->outcnt() != 0 || C->top() == n || n->is_Proj()) failed: No dead instructions after post-alloc In-Reply-To: References: Message-ID: <53iU0HRZL3UEoj9rwNG9aZHhivMp_7Lh4IU-DETPxK4=.c11c1926-2835-4339-9183-395015e185de@github.com> On Fri, 5 Nov 2021 06:19:25 GMT, Rickard B?ckman wrote: > Also delete Phi nodes with no uses. This pull request has now been integrated. Changeset: 06992208 Author: Rickard B?ckman URL: https://git.openjdk.java.net/jdk/commit/0699220830a457959b784b35af125b70f43fa3b0 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8268882: C2: assert(n->outcnt() != 0 || C->top() == n || n->is_Proj()) failed: No dead instructions after post-alloc Reviewed-by: neliasso, chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6270 From neliasso at openjdk.java.net Tue Nov 9 21:42:37 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 9 Nov 2021 21:42:37 GMT Subject: RFR: 8276546: [IR Framework] Whitelist and ignore CompileThreshold In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 13:21:56 GMT, Christian Hagedorn wrote: > This patch whitelists `CompileThreshold` and ignores it if passed as JTreg VM/Java option flag to a test. The reason to do this is that our CI executes `-XX:-TieredCompilation` in combination with `CompileThreshold` and therefore IR matching will not be performed (because `CompileThreshold` is not whitelisted). This patch changes this. > > Setting `CompileThreshold` with `TestFramework::addFlags/runWithFlags()` will normally apply the flag. > > Thanks, > Christian Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6312 From dlong at openjdk.java.net Wed Nov 10 01:12:45 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 10 Nov 2021 01:12:45 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v5] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 20:01:37 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 and arm builds I'm concerned with the number of lines and files changed, and how that will affect backports, etc. Wouldn't adding something like: typedef LIR_Opr LIR_OprDesc; make most of the renames unnecessary? I also wonder if LIR_Opr() is the best choice to replace NULL. In some places LIR_Opr::illegalOpr() is used instead, which seems inconsistent. Maybe we need LIR_Opr::nullOpr()? ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From manc at openjdk.java.net Wed Nov 10 04:07:36 2021 From: manc at openjdk.java.net (Man Cao) Date: Wed, 10 Nov 2021 04:07:36 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v5] In-Reply-To: References: Message-ID: <5x4O16UffEL5ebrgdh2LH_g-G6CJ54M5ZN0rPSZOlJ8=.31b049d4-b7a7-4326-8ed1-66e0540eaf6e@github.com> On Wed, 3 Nov 2021 20:01:37 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 and arm builds I'm running DaCapo benchmarks to check C1 performance, and will post results after it's done. > Wouldn't adding something like: > typedef LIR_Opr LIR_OprDesc; > make most of the renames unnecessary? That could work. Do you think it is better to split this into two or three RFEs: (1) Mainly change c1_LIR.hpp/cpp, with the "typedef LIR_Opr LIR_OprDesc;" workaround. This minimizes the patch size and makes it easier to backport to JDK 11 and JDK 17. (2) Rename all remaining LIR_OprDesc to LIR_Opr. (3) Replace "->" with "." and remove the hack "LIR_Opr* operator->() { return this; }" in c1_LIR.hpp. (~rasbold has made a patch for x86 Linux in Google internal JDK 11 already.) > I also wonder if LIR_Opr() is the best choice to replace NULL. In some places LIR_Opr::illegalOpr() is used instead, which seems inconsistent. Maybe we need LIR_Opr::nullOpr()? Do you think it is feasible to replace all the NULL and 0 with LIR_Opr::illegalOpr()? ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From manc at openjdk.java.net Wed Nov 10 04:15:37 2021 From: manc at openjdk.java.net (Man Cao) Date: Wed, 10 Nov 2021 04:15:37 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v5] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 20:01:37 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 and arm builds In fact, the first version of this patch in Google internal JDK 11 uses the "typedef LIR_Opr LIR_OprDesc;" hack in c1_LIR.hpp, with this comment: // UGLY HACK: add a type alias. `LIR_Opr` is not actually equivalent to the // previous `LIR_OprDesc` (`LIR_Opr` is like more similar to previous // `LIR_OprDesc*`). The only purpose of this typedef is so that the various // `LIR_OprDesc::enum_value` scattered everywhere don't need to be // modified. This should be removed, and a textual replacement of // `LIR_OprDesc::` to `LIR_Opr::` done throughout the code. typedef LIR_Opr LIR_OprDesc; ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From duke at openjdk.java.net Wed Nov 10 05:15:46 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Wed, 10 Nov 2021 05:15:46 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: On Tue, 2 Nov 2021 00:16:44 GMT, TatWai Chong wrote: >> After JDK-8269559 was integrated there are failures in tier1 testing >> across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. >> >> This patch is NOT functional; rather, this tends to verify potential >> toolchain issues as the original patch pass testing on other >> platforms. >> >> In this patch, we remove new SVE-related matching rules and register >> class introduced in the original patch to minimally affect the >> non-SVE part. > > TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: > > Add the matching rule in td file, enable control path in the code stub. Sound good. Let me know when the result came out, please. Then I merge this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From dlong at openjdk.java.net Wed Nov 10 06:17:39 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 10 Nov 2021 06:17:39 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v5] In-Reply-To: <5x4O16UffEL5ebrgdh2LH_g-G6CJ54M5ZN0rPSZOlJ8=.31b049d4-b7a7-4326-8ed1-66e0540eaf6e@github.com> References: <5x4O16UffEL5ebrgdh2LH_g-G6CJ54M5ZN0rPSZOlJ8=.31b049d4-b7a7-4326-8ed1-66e0540eaf6e@github.com> Message-ID: On Wed, 10 Nov 2021 04:04:59 GMT, Man Cao wrote: > That could work. Do you think it is better to split this into two or three RFEs: I would be in favor of that, if it's ok with @vnkozlov > Do you think it is feasible to replace all the NULL and 0 with LIR_Opr::illegalOpr()? I don't think so, if we want to preserve existing behavior. NULL and illegalOpr() were two different values before, and now LIR_Opr() introduces a new value that is different from both NULL and illegalOpr(), and with the value 0 I believe it becomes a valid pointer LIR_Opt. To preserve exisiting behavior, we should crash if any attempt is made to use a NULL LIR_Opr. illegalOpr() doesn't do that. We could change existing behavior rather than preserve it, but then each use of NULL would need to be examined on a case-by-case basis. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From chagedorn at openjdk.java.net Wed Nov 10 07:58:34 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 10 Nov 2021 07:58:34 GMT Subject: RFR: 8276546: [IR Framework] Whitelist and ignore CompileThreshold In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 13:21:56 GMT, Christian Hagedorn wrote: > This patch whitelists `CompileThreshold` and ignores it if passed as JTreg VM/Java option flag to a test. The reason to do this is that our CI executes `-XX:-TieredCompilation` in combination with `CompileThreshold` and therefore IR matching will not be performed (because `CompileThreshold` is not whitelisted). This patch changes this. > > Setting `CompileThreshold` with `TestFramework::addFlags/runWithFlags()` will normally apply the flag. > > Thanks, > Christian Thanks Vladimir and Nils for your reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/6312 From chagedorn at openjdk.java.net Wed Nov 10 08:12:41 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 10 Nov 2021 08:12:41 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: References: Message-ID: On Mon, 8 Nov 2021 09:18:14 GMT, Christian Hagedorn wrote: >> In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 >> ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) >> >> In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 >> >> During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 >> >> But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. >> >> I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. >> >> I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. >> >> I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > handle GVN Thanks Vladimir for reviewing it again. Unfortunately, I hit some repeatable performance regressions in some micro crypto benchmarks. I will have another look to find a better solution. It looks like it is a problem when the new phis are not directly transformed. I could imagine that other node transformations might interfere, possibly resulting in a different IR. Maybe another fix could be to somehow force the new phis to be transformed as immediate next nodes in IGVN right after the old phi node is completely processed and subsumed by the new `MergeMem` node. ------------- PR: https://git.openjdk.java.net/jdk/pull/6276 From duke at openjdk.java.net Wed Nov 10 09:24:40 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Wed, 10 Nov 2021 09:24:40 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v2] In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 02:05:01 GMT, Mai ??ng Qu?n Anh wrote: >> This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. >> >> In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright declaration Hi, may someone take a look at this PR, please. Thank you very much. ------------- PR: https://git.openjdk.java.net/jdk/pull/6101 From shade at openjdk.java.net Wed Nov 10 11:29:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Nov 2021 11:29:43 GMT Subject: RFR: 8276846: JDK-8273416 is incomplete for UseSSE=1 In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 11:52:50 GMT, Aleksey Shipilev wrote: > When doing the fix for [JDK-8273416](https://bugs.openjdk.java.net/browse/JDK-8273416), I made a mistake of not running with `UseSSE=1`. That one highlights a bug (see JIRA for reproducer): I have misjudged that `float` and `double` args are handled at the same SSE levels, which they are not. `doubles` uses FPU with SSE={0,1}, and `float` uses FPU with SSE={0}. See for example the blurbs in `MacroAssembler::{load|store}_{float|double}`. So the new predicate for `regFPR` argument mishandles UseSSE=1 case: we should not allow `castFF_PR` to match then. > > Additional testing: > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=1` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=2` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=3` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=4` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=1 -XX:UseSSE=4` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=2 -XX:UseSSE=4` Thanks, folks! ------------- PR: https://git.openjdk.java.net/jdk/pull/6308 From shade at openjdk.java.net Wed Nov 10 11:29:44 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Nov 2021 11:29:44 GMT Subject: Integrated: 8276846: JDK-8273416 is incomplete for UseSSE=1 In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 11:52:50 GMT, Aleksey Shipilev wrote: > When doing the fix for [JDK-8273416](https://bugs.openjdk.java.net/browse/JDK-8273416), I made a mistake of not running with `UseSSE=1`. That one highlights a bug (see JIRA for reproducer): I have misjudged that `float` and `double` args are handled at the same SSE levels, which they are not. `doubles` uses FPU with SSE={0,1}, and `float` uses FPU with SSE={0}. See for example the blurbs in `MacroAssembler::{load|store}_{float|double}`. So the new predicate for `regFPR` argument mishandles UseSSE=1 case: we should not allow `castFF_PR` to match then. > > Additional testing: > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=1` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=2` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=3` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=4` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=1 -XX:UseSSE=4` > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=2 -XX:UseSSE=4` This pull request has now been integrated. Changeset: a0b84453 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/a0b84453b087ff368a32b93729c5b30fda22ed48 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8276846: JDK-8273416 is incomplete for UseSSE=1 Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6308 From thartmann at openjdk.java.net Wed Nov 10 12:20:36 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 10 Nov 2021 12:20:36 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: On Tue, 2 Nov 2021 00:16:44 GMT, TatWai Chong wrote: >> After JDK-8269559 was integrated there are failures in tier1 testing >> across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. >> >> This patch is NOT functional; rather, this tends to verify potential >> toolchain issues as the original patch pass testing on other >> platforms. >> >> In this patch, we remove new SVE-related matching rules and register >> class introduced in the original patch to minimally affect the >> non-SVE part. > > TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: > > Add the matching rule in td file, enable control path in the code stub. All tests passed. Is there any difference between this version of the patch and the version that previously triggered failures? ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From thartmann at openjdk.java.net Wed Nov 10 12:27:40 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 10 Nov 2021 12:27:40 GMT Subject: RFR: 8273277: C2: Move conditional negation into rc_predicate [v3] In-Reply-To: References: Message-ID: <0UmfhLfdRDnPQOf4MVIkl6cVuJUcwr-79TbdULivxOs=.6d1577bb-a888-44e7-9ae3-7f1fa0f846c8@github.com> On Tue, 9 Nov 2021 20:20:10 GMT, Nils Eliasson wrote: >> Hi, >> >> I need some feedback on this patch. This was reported from Tencent and found in internal testing about the same time. This patch is based on a a patch provided by Tencent. >> >> In some very specific circumstances we need to negate the range checks that we create in PhaseIdealLoop::loop_predication_impl_helper. This is done in three places, but that method also calls insert_initial_skeleton_predicate where this isn't taken into account. >> >> To simplify things I have moved the negation logic into rc_predicate. This should prevent us from missing this check again. >> >> I do have a concern that negating the condition of the rangecheck in the skeleton predicate, since the skeleton predicate will be used as a clone template, and some rangechecks optimizations seems to assume that range checks always have LT as the condidtion. On the other hand - it is a really uncommon scenario since we haven't failed here before. >> >> Feedback appreciated. >> >> Best regards, >> Nils > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Add test Looks good to me. test/hotspot/jtreg/compiler/loopopts/TestSkeletonPredicateNegation.java line 29: > 27: * @bug 8273277 > 28: * @summary Skeleton predicates sometimes need to be negated > 29: * @run main compiler.loopopts.TestSkeletonPredicateNegation I think this should be changed to `@run driver`. test/hotspot/jtreg/compiler/loopopts/TestSkeletonPredicateNegation.java line 51: > 49: } > 50: > 51: public void mainTest (String[] args){ Code style should be `mainTest(String[] args) {` ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5987 From thartmann at openjdk.java.net Wed Nov 10 12:32:05 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 10 Nov 2021 12:32:05 GMT Subject: RFR: 8276112: Inconsistent scalar replacement debug info at safepoints Message-ID: [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) introduced aggressive scalar replacement of primitive boxes during incremental inlining if the box is only referenced by safepoint debug info: https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/callGenerator.cpp#L678-L683 It works by replacing safepoint usages by `SafePointScalarObject` nodes and adjusting the JVMState accordingly. For example, in `TestSafepointDebugInfo::test1` the `helper` method in line 56 is inlined and the box result of `Integer.valueOf` is scalar replaced in the safepoint debug info: 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) The `scalar` offset in the JVMState points to the integer field in the debug info. The problem is now that additional inlining can happen afterwards "on top of" this JVMState. In this case, the call to `Integer.valueOf` in line 57 is inlined, leading to the following JVMState for the inlined callee: 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) JVMS depth=2 loc=5 stk=6 arg=8 mon=10 scalar=10 end=10 mondepth=0 sp=2 bci=3 reexecute=false method=static jobject java.lang.Integer.valueOf(jint) ``` In this simple case, both caller and (inlined) callee state share the same `SafePointNode`. However, the `scalar` offset in the JVMState of the callee (depth 2) is not correct anymore and out of bounds. Parsing then emits an `unstable_if` trap and `GraphKit::add_safepoint_edges` merges above states to: 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: 337 CallStaticJava === 332 1 7 8 1 ( 336 1 315 10 10 10 328 ) [[]] # Static uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') void ( int ) Integer::valueOf @ bci:3 (line 1075) reexecute TestSafepointDebugInfo::test1 @ bci:6 (line 57) !jvms: Integer::valueOf @ bci:3 (line 1075) TestSafepointDebugInfo::test1 @ bci:6 (line 57) JVMS depth=1 loc=6 stk=8 arg=8 mon=8 scalar=8 end=9 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) JVMS depth=2 loc=9 stk=10 arg=12 mon=12 scalar=12 end=12 mondepth=0 sp=2 bci=3 reexecute=true method=static jobject java.lang.Integer.valueOf(jint) We then crash in `PhaseOutput::FillLocArray` when processing the `SafePointScalarObject` and trying to access the corresponding field in the debug info at (out of bounds) offset 12: https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/output.cpp#L833-L836 Even worse, in some scenarios we don't crash/assert but emit incorrect debug info leading to wrong results after deoptimization. For example, `TestSafepointDebugInfo::test2` fails because the `SafePointScalarObject` for `box1` and `box2` point to the same field in the debug info. This can happen if scalar replacement happens again "on top of" an already inconsistent JVMState. Afterwards, the out of bounds offset accidentally points to the field of the newly scalarized object. Originally, this issue only reproduced intermittently with a long running internal stress test but I was able to extract a set of simple regression tests that trigger different failure modes in the compiler or wrong execution. I think fixing this is complicated and I therefore propose to disable [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) for now and file an enhancement to fix and re-enable it later. We should then also add proper verification code and more complete tests. Thanks, Tobias ------------- Commit messages: - Removed assert - Whitespace fix - 8276112: Inconsistent scalar replacement debug info at safepoints Changes: https://git.openjdk.java.net/jdk/pull/6333/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6333&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276112 Stats: 192 lines in 4 files changed: 176 ins; 1 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/6333.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6333/head:pull/6333 PR: https://git.openjdk.java.net/jdk/pull/6333 From thartmann at openjdk.java.net Wed Nov 10 13:51:33 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 10 Nov 2021 13:51:33 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v6] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 02:54:39 GMT, ?? wrote: >> `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. >> >> For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. >> image >> >> >> In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. >> image >> >> There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: >> >> diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp >> index 38b40a6..31ff172 100644 >> --- a/src/hotspot/share/opto/ifnode.cpp >> +++ b/src/hotspot/share/opto/ifnode.cpp >> @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { >> } >> } >> >> + if (is_LongCountedLoopEnd()) { >> + set_req(0, dom->in(0)); >> + set_req(1, dom->in(1)); >> + dom->set_req(0, pre); >> + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); >> + Node* proj0 = raw_out(0); >> + Node* proj1 = raw_out(1); >> + Node* dom_proj0 = dom->raw_out(0); >> + Node* dom_proj1 = dom->raw_out(1); >> + dom_proj0->set_req(0, this); >> + dom_proj1->set_req(0, this); >> + proj0->set_req(0, dom); >> + proj1->set_req(0, dom); >> + } >> + >> if (bol->outcnt() == 0) { >> igvn->remove_dead_node(bol); // Kill the BoolNode. >> } >> diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp >> index 6f7e34d..7955722 100644 >> --- a/src/hotspot/share/opto/loopnode.cpp >> +++ b/src/hotspot/share/opto/loopnode.cpp >> @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List >> Node* back_control = head->in(LoopNode::LoopBackControl); >> >> // data nodes on back branch not supported >> - if (back_control->outcnt() > 1) { >> + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { >> return false; >> } > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Specify vm option needs option 'othervm' That looks reasonable to me but please add a comment to the new code. I'll run some testing in the meantime. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6099 From thartmann at openjdk.java.net Wed Nov 10 14:24:35 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 10 Nov 2021 14:24:35 GMT Subject: RFR: 8275317: AArch64: Support some type conversion vectorization in SLP In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 03:39:42 GMT, Fei Gao wrote: > Current SLP vectorizer in C2 compiler doesn't support type conversion > operations. But AArch64 has vector type conversion instructions in > both NEON and SVE. > > The type conversion involves two kinds of scenarios, conversion between > the same data sizes and conversion between different data sizes. If we > want to support casts between different data sizes, we need to amend > the code part for identifying adjacent memory references and the code > part for justifying if the combination is profitable. I suppose it > would be easier to review if we split the whole task to support type > conversion into two separate patches, one for the same data sizes and > the other one for different data sizes. The goal of this patch is just > to support conversions within the same data size, including: > int -> float > float -> int > long -> double > double -> long > > A typical test case: > > for (int i = start; i < limit; i++) { > b[i] = (float) a[i]; > } > > To implement it, the patch completed the necessary instructions and > matching rules in the backend and added implementation for SLP in > the middle end. > > The percentage of performance uplift on aarch64 system: > Mode: avgt > Cnt: 15 > Metric: (ns/op) > > benchmark percentage change [(After-Before)/Before] > VectorLoop.convertD2L -48.46% > VectorLoop.convertF2I -55.67% > VectorLoop.convertI2F -55.27% > VectorLoop.convertL2D -48.75% That looks good to me but x86 supports vector instructions for these operations as well, right? Or am I missing something? https://github.com/openjdk/jdk/blob/55b36c6f3bb7eb066daaf41f9eba46633afedf08/src/hotspot/cpu/x86/x86.ad#L6701 Do you have perf numbers for x86? ------------- PR: https://git.openjdk.java.net/jdk/pull/6145 From rkennke at openjdk.java.net Wed Nov 10 16:55:54 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 10 Nov 2021 16:55:54 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently Message-ID: The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. Testing: - [x] tier1 - [x] tier2 - [ ] tier3 - [ ] tier4 ------------- Commit messages: - Change VerifyHeavyMonitors flag to diagnostic - 8276901: Implement UseHeavyMonitors consistently Changes: https://git.openjdk.java.net/jdk/pull/6320/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276901 Stats: 190 lines in 12 files changed: 54 ins; 18 del; 118 mod Patch: https://git.openjdk.java.net/jdk/pull/6320.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320 PR: https://git.openjdk.java.net/jdk/pull/6320 From mdoerr at openjdk.java.net Wed Nov 10 17:12:44 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 10 Nov 2021 17:12:44 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> References: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> Message-ID: On Thu, 4 Nov 2021 16:28:52 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. Thanks for adding a test. Your new additions look basically good, but I have a few remarks and questions. src/hotspot/share/prims/whitebox.cpp line 987: > 985: bool overflow = false; > 986: for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { > 987: if (reason_str != NULL && !strcmp(reason_str, Deoptimization::trap_reason_name(reason))) { Maybe the code would be better readable when checking `reason_str != NULL` first and then use 2 loops? Just a minor suggestion. Should only be done if readability is better. src/hotspot/share/prims/whitebox.cpp line 1016: > 1014: } > 1015: ResourceMark rm(THREAD); > 1016: char *reason_str = (reason_obj == NULL) ? I think we should use `const char*` as far as possible. src/hotspot/share/runtime/deoptimization.cpp line 2695: > 2693: return 0; > 2694: } > 2695: Why do we need this? Is it a placeholder for a future enhancement? If so, a comment would at least be helpful. test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 78: > 76: private static final WhiteBox WB = WhiteBox.getWhiteBox(); > 77: // Until JDK-8275908 is not fixed, null-pointer traps for invokes and array-store traps are not profiled in the interpreter. > 78: private static final boolean JDK8275908_fixed = false; I don't know if that one should get fixed first, but I'm ok with your workaround. Would it make sense to add that bug id to this test's header? ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From chagedorn at openjdk.java.net Wed Nov 10 17:17:34 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 10 Nov 2021 17:17:34 GMT Subject: RFR: 8273277: C2: Move conditional negation into rc_predicate [v3] In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 20:20:10 GMT, Nils Eliasson wrote: >> Hi, >> >> I need some feedback on this patch. This was reported from Tencent and found in internal testing about the same time. This patch is based on a a patch provided by Tencent. >> >> In some very specific circumstances we need to negate the range checks that we create in PhaseIdealLoop::loop_predication_impl_helper. This is done in three places, but that method also calls insert_initial_skeleton_predicate where this isn't taken into account. >> >> To simplify things I have moved the negation logic into rc_predicate. This should prevent us from missing this check again. >> >> I do have a concern that negating the condition of the rangecheck in the skeleton predicate, since the skeleton predicate will be used as a clone template, and some rangechecks optimizations seems to assume that range checks always have LT as the condidtion. On the other hand - it is a really uncommon scenario since we haven't failed here before. >> >> Feedback appreciated. >> >> Best regards, >> Nils > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Add test That looks reasonable to me. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5987 From kvn at openjdk.java.net Wed Nov 10 18:13:34 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 10 Nov 2021 18:13:34 GMT Subject: RFR: 8276112: Inconsistent scalar replacement debug info at safepoints In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 12:12:39 GMT, Tobias Hartmann wrote: > [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) introduced aggressive scalar replacement of primitive boxes during incremental inlining if the box is only referenced by safepoint debug info: > https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/callGenerator.cpp#L678-L683 > > It works by replacing safepoint usages by `SafePointScalarObject` nodes and adjusting the JVMState accordingly. For example, in `TestSafepointDebugInfo::test1` the `helper` method in line 56 is inlined and the box result of `Integer.valueOf` is scalar replaced in the safepoint debug info: > > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) > JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > > > The `scalar` offset in the JVMState points to the integer field in the debug info. The problem is now that additional inlining can happen afterwards "on top of" this JVMState. In this case, the call to `Integer.valueOf` in line 57 is inlined, leading to the following JVMState for the inlined callee: > > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) > JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > JVMS depth=2 loc=5 stk=6 arg=8 mon=10 scalar=10 end=10 mondepth=0 sp=2 bci=3 reexecute=false method=static jobject java.lang.Integer.valueOf(jint) > ``` > > In this simple case, both caller and (inlined) callee state share the same `SafePointNode`. However, the `scalar` offset in the JVMState of the callee (depth 2) is not correct anymore and out of bounds. > > Parsing then emits an `unstable_if` trap and `GraphKit::add_safepoint_edges` merges above states to: > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 337 CallStaticJava === 332 1 7 8 1 ( 336 1 315 10 10 10 328 ) [[]] # Static uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') void ( int ) Integer::valueOf @ bci:3 (line 1075) reexecute TestSafepointDebugInfo::test1 @ bci:6 (line 57) !jvms: Integer::valueOf @ bci:3 (line 1075) TestSafepointDebugInfo::test1 @ bci:6 (line 57) > JVMS depth=1 loc=6 stk=8 arg=8 mon=8 scalar=8 end=9 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > JVMS depth=2 loc=9 stk=10 arg=12 mon=12 scalar=12 end=12 mondepth=0 sp=2 bci=3 reexecute=true method=static jobject java.lang.Integer.valueOf(jint) > > > We then crash in `PhaseOutput::FillLocArray` when processing the `SafePointScalarObject` and trying to access the corresponding field in the debug info at (out of bounds) offset 12: > https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/output.cpp#L833-L836 > > Even worse, in some scenarios we don't crash/assert but emit incorrect debug info leading to wrong results after deoptimization. For example, `TestSafepointDebugInfo::test2` fails because the `SafePointScalarObject` for `box1` and `box2` point to the same field in the debug info. This can happen if scalar replacement happens again "on top of" an already inconsistent JVMState. Afterwards, the out of bounds offset accidentally points to the field of the newly scalarized object. > > Originally, this issue only reproduced intermittently with a long running internal stress test but I was able to extract a set of simple regression tests that trigger different failure modes in the compiler or wrong execution. > > I think fixing this is complicated and I therefore propose to disable [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) for now and file an enhancement to fix and re-enable it later. We should then also add proper verification code and more complete tests. > > Thanks, > Tobias Good. Yes, currently scalarization assumes that debug info is finalized and does not change. We have to move this optimization after all inlining is done. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6333 From rkennke at openjdk.java.net Wed Nov 10 19:19:13 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 10 Nov 2021 19:19:13 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v2] In-Reply-To: References: Message-ID: > The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. > I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. > > The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [ ] tier4 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Verify monitors even in non-debug builds ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6320/files - new: https://git.openjdk.java.net/jdk/pull/6320/files/f7b4c179..49dbc146 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=00-01 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6320.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320 PR: https://git.openjdk.java.net/jdk/pull/6320 From kvn at openjdk.java.net Wed Nov 10 20:09:37 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 10 Nov 2021 20:09:37 GMT Subject: RFR: 8273277: C2: Move conditional negation into rc_predicate [v3] In-Reply-To: References: Message-ID: <4T-ST7vEyQWmUKN_GOcxNj8gTZE8zfH_ifYtUAI03bM=.159131c6-e525-4623-bd61-d8733a444ab1@github.com> On Tue, 9 Nov 2021 20:20:10 GMT, Nils Eliasson wrote: >> Hi, >> >> I need some feedback on this patch. This was reported from Tencent and found in internal testing about the same time. This patch is based on a a patch provided by Tencent. >> >> In some very specific circumstances we need to negate the range checks that we create in PhaseIdealLoop::loop_predication_impl_helper. This is done in three places, but that method also calls insert_initial_skeleton_predicate where this isn't taken into account. >> >> To simplify things I have moved the negation logic into rc_predicate. This should prevent us from missing this check again. >> >> I do have a concern that negating the condition of the rangecheck in the skeleton predicate, since the skeleton predicate will be used as a clone template, and some rangechecks optimizations seems to assume that range checks always have LT as the condidtion. On the other hand - it is a really uncommon scenario since we haven't failed here before. >> >> Feedback appreciated. >> >> Best regards, >> Nils > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Add test I agree. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5987 From kvn at openjdk.java.net Wed Nov 10 20:14:35 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 10 Nov 2021 20:14:35 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v6] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 02:54:39 GMT, ?? wrote: >> `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. >> >> For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. >> image >> >> >> In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. >> image >> >> There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: >> >> diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp >> index 38b40a6..31ff172 100644 >> --- a/src/hotspot/share/opto/ifnode.cpp >> +++ b/src/hotspot/share/opto/ifnode.cpp >> @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { >> } >> } >> >> + if (is_LongCountedLoopEnd()) { >> + set_req(0, dom->in(0)); >> + set_req(1, dom->in(1)); >> + dom->set_req(0, pre); >> + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); >> + Node* proj0 = raw_out(0); >> + Node* proj1 = raw_out(1); >> + Node* dom_proj0 = dom->raw_out(0); >> + Node* dom_proj1 = dom->raw_out(1); >> + dom_proj0->set_req(0, this); >> + dom_proj1->set_req(0, this); >> + proj0->set_req(0, dom); >> + proj1->set_req(0, dom); >> + } >> + >> if (bol->outcnt() == 0) { >> igvn->remove_dead_node(bol); // Kill the BoolNode. >> } >> diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp >> index 6f7e34d..7955722 100644 >> --- a/src/hotspot/share/opto/loopnode.cpp >> +++ b/src/hotspot/share/opto/loopnode.cpp >> @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List >> Node* back_control = head->in(LoopNode::LoopBackControl); >> >> // data nodes on back branch not supported >> - if (back_control->outcnt() > 1) { >> + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { >> return false; >> } > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Specify vm option needs option 'othervm' Looks good. But please wait tests results from Tobias. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6099 From kvn at openjdk.java.net Wed Nov 10 20:40:34 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 10 Nov 2021 20:40:34 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v2] In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 02:05:01 GMT, Mai ??ng Qu?n Anh wrote: >> This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. >> >> In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright declaration src/hotspot/share/opto/subnode.cpp line 1533: > 1531: > 1532: const int cmp1_op = cmp1->Opcode(); > 1533: const int cmp2_op = cmp2->Opcode(); Use these in previous code too by move them to line `#1485` and replacing `op2`. src/hotspot/share/opto/subnode.cpp line 1535: > 1533: const int cmp2_op = cmp2->Opcode(); > 1534: > 1535: // Change x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE into x u<=> y The comment should include `cmp2_op == Op_ConI` case. Also it is not clear from comment and code if different operations are allowed on both side: `x - MIN_INT <= y + MIN_INT` src/hotspot/share/opto/subnode.cpp line 1537: > 1535: // Change x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE into x u<=> y > 1536: if ((_test._test == BoolTest::lt || _test._test == BoolTest::le || > 1537: _test._test == BoolTest::gt || _test._test == BoolTest::ge) && You can use `(_test. is_less() || _test.is_greater())` here. src/hotspot/share/opto/subnode.cpp line 1538: > 1536: if ((_test._test == BoolTest::lt || _test._test == BoolTest::le || > 1537: _test._test == BoolTest::gt || _test._test == BoolTest::ge) && > 1538: cop == Op_CmpI && `cop == Op_CmpI` should be first check before `_test` check. src/hotspot/share/opto/subnode.cpp line 1542: > 1540: phase->type(cmp1->in(2)) == TypeInt::MIN) { > 1541: if (cmp2_op == Op_ConI) { > 1542: Node *ncmp2 = phase->intcon(java_add(cmp2->get_int(), min_jint)); What if `cmp1_op == Op_SubI` ? src/hotspot/share/opto/subnode.cpp line 1545: > 1543: Node *ncmp = phase->transform(new CmpUNode(cmp1->in(1), ncmp2)); > 1544: return new BoolNode(ncmp, _test._test); > 1545: } else if ((cmp2_op == Op_AddI || cmp2_op == Op_SubI) && Again the question about mismatching cmp1_op and cmp2_op. src/hotspot/share/opto/subnode.cpp line 1555: > 1553: if ((_test._test == BoolTest::lt || _test._test == BoolTest::le || > 1554: _test._test == BoolTest::gt || _test._test == BoolTest::ge) && > 1555: cop == Op_CmpL && Same suggestions as for `Op_CmpI` test/micro/org/openjdk/bench/vm/compiler/UnsignedComparison.java line 43: > 41: public long compareLong(long arg0, long arg1) { > 42: return arg0 + Long.MIN_VALUE < arg1 + Long.MIN_VALUE ? 1 : 0; > 43: } Add tests for subtraction. ------------- PR: https://git.openjdk.java.net/jdk/pull/6101 From kvn at openjdk.java.net Wed Nov 10 20:44:36 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 10 Nov 2021 20:44:36 GMT Subject: RFR: 8267928: Loop predicate gets inexact loop limit before PhaseIdealLoop::rc_predicate In-Reply-To: References: Message-ID: On Fri, 28 May 2021 13:29:36 GMT, Yi Yang wrote: > Loop predicate gets inexact loop limit(LoopLimitNode) from exact_limit(even if the limit is statically known) and does unnecessary overflow checking when generating lower bound test(rc_predicate). The reason is rather straightforward: exact_limit fails to see a HasExactTripCount flag since it would be set after performing loop predicate(iteration_split). Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4247 From kvn at openjdk.java.net Wed Nov 10 20:49:37 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 10 Nov 2021 20:49:37 GMT Subject: RFR: 8274982: Add a test for 8269574. In-Reply-To: References: Message-ID: On Mon, 11 Oct 2021 09:55:28 GMT, Evgeny Nikitin wrote: > This PR contains a relatively simple test which verifies that JVMTI-agents are correctly informed about exceptions caught in C2-compiled code. The 8269574 introduces pre-allocated exceptions in some paths, so the test tries to produce a number of various exceptions and check that provided small JVMTI agent got notified about all of them. Someone from serviceability group have to look on this test too. ------------- PR: https://git.openjdk.java.net/jdk/pull/5889 From kvn at openjdk.java.net Wed Nov 10 20:52:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 10 Nov 2021 20:52:45 GMT Subject: RFR: 8274982: Add a test for 8269574. In-Reply-To: References: Message-ID: On Mon, 11 Oct 2021 09:55:28 GMT, Evgeny Nikitin wrote: > This PR contains a relatively simple test which verifies that JVMTI-agents are correctly informed about exceptions caught in C2-compiled code. The 8269574 introduces pre-allocated exceptions in some paths, so the test tries to produce a number of various exceptions and check that provided small JVMTI agent got notified about all of them. I am fine with changes. What testing was done? What testing tiers were run? ------------- PR: https://git.openjdk.java.net/jdk/pull/5889 From sspitsyn at openjdk.java.net Wed Nov 10 22:35:34 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 10 Nov 2021 22:35:34 GMT Subject: RFR: 8274982: Add a test for 8269574. In-Reply-To: References: Message-ID: On Mon, 11 Oct 2021 09:55:28 GMT, Evgeny Nikitin wrote: > This PR contains a relatively simple test which verifies that JVMTI-agents are correctly informed about exceptions caught in C2-compiled code. The 8269574 introduces pre-allocated exceptions in some paths, so the test tries to produce a number of various exceptions and check that provided small JVMTI agent got notified about all of them. test/hotspot/jtreg/compiler/jvmti/TriggerBuiltinExceptionsTest.java line 128: > 126: > 127: Asserts.assertEQ( > 128: TriggerBuiltinExceptionsTest.caughtByJVMTIAgent(), caughtByJavaTest, What is the reason to use the class name prefix for methods? : TriggerBuiltinExceptionsTest.compileMethodOrThrow TriggerBuiltinExceptionsTest.methodToCompile TriggerBuiltinExceptionsTest.caughtByJVMTIAgent It is not really needed, tight? ------------- PR: https://git.openjdk.java.net/jdk/pull/5889 From sspitsyn at openjdk.java.net Wed Nov 10 22:42:33 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 10 Nov 2021 22:42:33 GMT Subject: RFR: 8274982: Add a test for 8269574. In-Reply-To: References: Message-ID: On Mon, 11 Oct 2021 09:55:28 GMT, Evgeny Nikitin wrote: > This PR contains a relatively simple test which verifies that JVMTI-agents are correctly informed about exceptions caught in C2-compiled code. The 8269574 introduces pre-allocated exceptions in some paths, so the test tries to produce a number of various exceptions and check that provided small JVMTI agent got notified about all of them. Hi Evgeny, New test looks good to me. I've inlined a couple of minor comments/suggestions. Thanks, Serguei test/hotspot/jtreg/compiler/jvmti/libTriggerBuiltinExceptions.cpp line 77: > 75: } > 76: > 77: } while (false); I'm not sure why the while (false) loop is needed. You can always return JNI_ERR instead of break in all places where the result != JVMTI_ERROR_NONE is detected and return JNI_OK at the end. Is it to for one-return style? ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5889 From manc at openjdk.java.net Wed Nov 10 23:50:31 2021 From: manc at openjdk.java.net (Man Cao) Date: Wed, 10 Nov 2021 23:50:31 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v5] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 20:01:37 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 and arm builds I uploaded benchmark results to https://bugs.openjdk.java.net/browse/JDK-8276453. It is over 20 trials. There is no observable difference. The most relevant metric is "Total Time in Compilation", which is the value of the hsperfdata counter java.ci.totalTime. It includes time spent in both C1 and C2. Let me know if you also need the logs from +LogCompilation. I have them but they are extremely large (900MiB after 7zip for 20 trials). I'll work on changing LIR_Opr() to LIR_Opr::nullOpr() and splitting it up, as it also makes sense that the smaller patch should be backported to JDK 11 and 17. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From manc at openjdk.java.net Thu Nov 11 01:57:58 2021 From: manc at openjdk.java.net (Man Cao) Date: Thu, 11 Nov 2021 01:57:58 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v6] In-Reply-To: References: Message-ID: > Hi all, > > Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. > If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". Man Cao has updated the pull request incrementally with two additional commits since the last revision: - Use nullOpr() or {} instead of LIR_Opr() - Revert the renaming from LIR_OprDesc to LIR_Opr to minimize patch size ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6221/files - new: https://git.openjdk.java.net/jdk/pull/6221/files/d881f81d..adaf6d4e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=04-05 Stats: 256 lines in 21 files changed: 14 ins; 0 del; 242 mod Patch: https://git.openjdk.java.net/jdk/pull/6221.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6221/head:pull/6221 PR: https://git.openjdk.java.net/jdk/pull/6221 From duke at openjdk.java.net Thu Nov 11 02:00:33 2021 From: duke at openjdk.java.net (Fei Gao) Date: Thu, 11 Nov 2021 02:00:33 GMT Subject: RFR: 8275317: AArch64: Support some type conversion vectorization in SLP In-Reply-To: References: Message-ID: <5IIKpJ9G6sYmfETcZWisZad_rocl87qyOrY0LXL6HPU=.840b239b-6ccc-4db2-ab2e-9dda96fb1b09@github.com> On Wed, 10 Nov 2021 14:21:13 GMT, Tobias Hartmann wrote: > That looks good to me but x86 supports vector instructions for these operations as well, right? Or am I missing something? > > https://github.com/openjdk/jdk/blob/55b36c6f3bb7eb066daaf41f9eba46633afedf08/src/hotspot/cpu/x86/x86.ad#L6701 > > Do you have perf numbers for x86? Yes, I got perf data on X86 as showed below: Before the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2L 523 avgt 15 527.330 ? 6.159 ns/op VectorLoop.convertF2I 523 avgt 15 545.808 ? 4.677 ns/op VectorLoop.convertI2F 523 avgt 15 373.227 ? 1.259 ns/op VectorLoop.convertL2D 523 avgt 15 869.646 ? 0.183 ns/op After the patch: Benchmark (length) Mode Cnt Score Error Units VectorLoop.convertD2L 523 avgt 15 530.785 ? 4.767 ns/op VectorLoop.convertF2I 523 avgt 15 545.831 ? 7.576 ns/op VectorLoop.convertI2F 523 avgt 15 66.562 ? 2.270 ns/op VectorLoop.convertL2D 523 avgt 15 869.510 ? 0.075 ns/op X86 supports int to FP only, and got performance uplift on convertI2F. But it has implementation limitation on both FP to integer types and double to long, and got no obvious positive effect on these scenarios. ------------- PR: https://git.openjdk.java.net/jdk/pull/6145 From duke at openjdk.java.net Thu Nov 11 02:22:40 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Thu, 11 Nov 2021 02:22:40 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: On Tue, 2 Nov 2021 00:16:44 GMT, TatWai Chong wrote: >> After JDK-8269559 was integrated there are failures in tier1 testing >> across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. >> >> This patch is NOT functional; rather, this tends to verify potential >> toolchain issues as the original patch pass testing on other >> platforms. >> >> In this patch, we remove new SVE-related matching rules and register >> class introduced in the original patch to minimally affect the >> non-SVE part. > > TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: > > Add the matching rule in td file, enable control path in the code stub. They are almost identical except the comparison in the stub code has been changed from "if (!UseSVE)" to "if (UseSVE == 0)". It doesn't mean that that is the cause of these failures. The generated codes from Toolchains are different from patch to patch. They should be equivalent but as I'm not familiar macOS toolchain, this change can avoid unnecessarily casting from `uint` to `bool`. Please see [UseSVE](https://github.com/openjdk/jdk/blob/673ce7efa56e7eb54266af6fe795d46d57f51bdc/src/hotspot/cpu/aarch64/globals_aarch64.hpp#L101) ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From kvn at openjdk.java.net Thu Nov 11 02:22:37 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 11 Nov 2021 02:22:37 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v5] In-Reply-To: References: <5x4O16UffEL5ebrgdh2LH_g-G6CJ54M5ZN0rPSZOlJ8=.31b049d4-b7a7-4326-8ed1-66e0540eaf6e@github.com> Message-ID: On Wed, 10 Nov 2021 06:14:18 GMT, Dean Long wrote: > > That could work. Do you think it is better to split this into two or three RFEs: > > I would be in favor of that, if it's ok with @vnkozlov Yes, I agree with that. Make small patch to backport it. And do refactoring separately only in latest JDK. > > > Do you think it is feasible to replace all the NULL and 0 with LIR_Opr::illegalOpr()? > > I don't think so, if we want to preserve existing behavior. NULL and illegalOpr() were two different values before, and now LIR_Opr() introduces a new value that is different from both NULL and illegalOpr(), and with the value 0 I believe it becomes a valid pointer LIR_Opt. To preserve exisiting behavior, we should crash if any attempt is made to use a NULL LIR_Opr. illegalOpr() doesn't do that. We could change existing behavior rather than preserve it, but then each use of NULL would need to be examined on a case-by-case basis. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From duke at openjdk.java.net Thu Nov 11 02:48:28 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Thu, 11 Nov 2021 02:48:28 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v7] In-Reply-To: References: Message-ID: > `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. > > For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. > image > > > In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. > image > > There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: > > diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp > index 38b40a6..31ff172 100644 > --- a/src/hotspot/share/opto/ifnode.cpp > +++ b/src/hotspot/share/opto/ifnode.cpp > @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { > } > } > > + if (is_LongCountedLoopEnd()) { > + set_req(0, dom->in(0)); > + set_req(1, dom->in(1)); > + dom->set_req(0, pre); > + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); > + Node* proj0 = raw_out(0); > + Node* proj1 = raw_out(1); > + Node* dom_proj0 = dom->raw_out(0); > + Node* dom_proj1 = dom->raw_out(1); > + dom_proj0->set_req(0, this); > + dom_proj1->set_req(0, this); > + proj0->set_req(0, dom); > + proj1->set_req(0, dom); > + } > + > if (bol->outcnt() == 0) { > igvn->remove_dead_node(bol); // Kill the BoolNode. > } > diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp > index 6f7e34d..7955722 100644 > --- a/src/hotspot/share/opto/loopnode.cpp > +++ b/src/hotspot/share/opto/loopnode.cpp > @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List > Node* back_control = head->in(LoopNode::LoopBackControl); > > // data nodes on back branch not supported > - if (back_control->outcnt() > 1) { > + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { > return false; > } ?? has updated the pull request incrementally with one additional commit since the last revision: Add some comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6099/files - new: https://git.openjdk.java.net/jdk/pull/6099/files/ccfa6f10..89d00e14 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6099&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6099&range=05-06 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6099.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6099/head:pull/6099 PR: https://git.openjdk.java.net/jdk/pull/6099 From duke at openjdk.java.net Thu Nov 11 02:48:29 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Thu, 11 Nov 2021 02:48:29 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v6] In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 13:48:25 GMT, Tobias Hartmann wrote: > That looks reasonable to me but please add a comment to the new code. > > I'll run some testing in the meantime. Thank you very much for your review and testing. I have added some comment to the newly added code. ------------- PR: https://git.openjdk.java.net/jdk/pull/6099 From duke at openjdk.java.net Thu Nov 11 02:53:32 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Thu, 11 Nov 2021 02:53:32 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v6] In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 20:11:42 GMT, Vladimir Kozlov wrote: > Looks good. But please wait tests results from Tobias. Thank you very much for your review and work. I have passed jtreg and some test-suite on my server, and it's always a good thing to run more tests. ------------- PR: https://git.openjdk.java.net/jdk/pull/6099 From kvn at openjdk.java.net Thu Nov 11 02:53:33 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 11 Nov 2021 02:53:33 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v5] In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 23:47:14 GMT, Man Cao wrote: > I uploaded benchmark results to https://bugs.openjdk.java.net/browse/JDK-8276453. It is over 20 trials. There is no observable difference. The most relevant metric is "Total Time in Compilation", which is the value of the hsperfdata counter java.ci.totalTime. It includes time spent in both C1 and C2. Let me know if you also need the logs from +LogCompilation. I have them but they are extremely large (900MiB after 7zip for 20 trials). > > I'll work on changing LIR_Opr() to LIR_Opr::nullOpr() and splitting it up, as it also makes sense that the smaller patch should be backported to JDK 11 and 17. Thank you for collecting perf data. Yes, `java.ci.totalTime` is good indicator. I don't know how you run but to get only C1 times and more or less accurate time you need to run with `-XX:TieredStopAtLevel=3 -XX:CICompilerCount=1` flags. I was thinking about using LogCompilation tool to get C1 only times from logs (https://github.com/openjdk/jdk/tree/master/src/utils/LogCompilation) $ java -XX:+LogCompilation -XX:TieredStopAtLevel=3 -XX:CICompilerCount=1 ... $ java -jar logc.jar -S hotspot_pid609766.log | tail -15 NMethods: 1230 created 1230 live 4175704 bytes (4175704 peak) in the code cache Phase times: setup 0.0020 0 parse_hir 0.4560 0 optimize_blocks 0.2910 0 gvn 0.0170 0 rangeCheckElimination 0.0820 0 optimize_null_checks 0.0060 0 buildIR 0.9750 0 lirGeneration 0.0360 0 linearScan 0.3110 0 emit_lir 0.3620 0 codeemit 0.0810 0 codeinstall 0.0530 0 total 1.8720 But it is not accurate as `java.ci.totalTime` ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From dlong at openjdk.java.net Thu Nov 11 04:27:39 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 11 Nov 2021 04:27:39 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v6] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 01:57:58 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Use nullOpr() or {} instead of LIR_Opr() > - Revert the renaming from LIR_OprDesc to LIR_Opr to minimize patch size I think we want any operation on LIR_Opr() or nullOpr() to fail, except ==, !=, and is_equal(), so I think 0 is a poor choice: LIR_Opr() : _value(0) {} because it would seem to allow calls like opr->nullOpr()->pointer() to succeed. Your suggestion to use illegalOpr() instead is probably going to end up being the best choice after all. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From dlong at openjdk.java.net Thu Nov 11 04:36:47 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 11 Nov 2021 04:36:47 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v6] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 01:57:58 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Use nullOpr() or {} instead of LIR_Opr() > - Revert the renaming from LIR_OprDesc to LIR_Opr to minimize patch size src/hotspot/cpu/arm/c1_LIRGenerator_arm.cpp line 367: > 365: > 366: void LIRGenerator::CardTableBarrierSet_post_barrier_helper(LIR_Opr addr, LIR_Const* card_table_base) { > 367: assert(addr->is_register(), "must be a register at this point"); The above rename seems unnecessary. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From dlong at openjdk.java.net Thu Nov 11 04:47:44 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 11 Nov 2021 04:47:44 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v6] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 01:57:58 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Use nullOpr() or {} instead of LIR_Opr() > - Revert the renaming from LIR_OprDesc to LIR_Opr to minimize patch size Correction. The existing NULL value had me confused, thinking it would cause a crash if we ever tried to use it. I see now that LIR_Opr() : _value(0) {} just preserves the existing behavior, and we will never get a crash because we never dereference the pointer. But it does allow strange things like pointer() to return invalid memory, so again, I think it's best not to allow that. Do you agree, @caoman and @vnkozlov? ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From duke at openjdk.java.net Thu Nov 11 05:12:09 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Thu, 11 Nov 2021 05:12:09 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v3] In-Reply-To: References: Message-ID: > This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. > > In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. > > Thank you very much. Mai ??ng Qu?n Anh has updated the pull request incrementally with two additional commits since the last revision: - replace cmpx->Opcode() with cmpx_op - address reviews, remove checks for subtraction operatios ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6101/files - new: https://git.openjdk.java.net/jdk/pull/6101/files/69842dfe..4dada5fc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6101&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6101&range=01-02 Stats: 24 lines in 1 file changed: 3 ins; 4 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/6101.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6101/head:pull/6101 PR: https://git.openjdk.java.net/jdk/pull/6101 From duke at openjdk.java.net Thu Nov 11 05:12:13 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Thu, 11 Nov 2021 05:12:13 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v2] In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 20:19:52 GMT, Vladimir Kozlov wrote: >> Mai ??ng Qu?n Anh has updated the pull request incrementally with one additional commit since the last revision: >> >> fix copyright declaration > > src/hotspot/share/opto/subnode.cpp line 1537: > >> 1535: // Change x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE into x u<=> y >> 1536: if ((_test._test == BoolTest::lt || _test._test == BoolTest::le || >> 1537: _test._test == BoolTest::gt || _test._test == BoolTest::ge) && > > You can use `(_test. is_less() || _test.is_greater())` here. Thank you very much for the review, I have addressed this and the following comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/6101 From duke at openjdk.java.net Thu Nov 11 05:16:34 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Thu, 11 Nov 2021 05:16:34 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v2] In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 20:36:55 GMT, Vladimir Kozlov wrote: >> Mai ??ng Qu?n Anh has updated the pull request incrementally with one additional commit since the last revision: >> >> fix copyright declaration > > src/hotspot/share/opto/subnode.cpp line 1542: > >> 1540: phase->type(cmp1->in(2)) == TypeInt::MIN) { >> 1541: if (cmp2_op == Op_ConI) { >> 1542: Node *ncmp2 = phase->intcon(java_add(cmp2->get_int(), min_jint)); > > What if `cmp1_op == Op_SubI` ? Since `x + MIN_VALUE == x - MIN_VALUE`, the operation is the same with addition and subtraction operations. However, I realised that `x - MIN_VALUE` is idealised into `x + MIN_VALUE` beforehand, so I have removed the check for `Op_SubI` here. > test/micro/org/openjdk/bench/vm/compiler/UnsignedComparison.java line 43: > >> 41: public long compareLong(long arg0, long arg1) { >> 42: return arg0 + Long.MIN_VALUE < arg1 + Long.MIN_VALUE ? 1 : 0; >> 43: } > > Add tests for subtraction. Since I have removed the subtraction circumstances, do we still need tests for them here. Thank you very much. ------------- PR: https://git.openjdk.java.net/jdk/pull/6101 From dholmes at openjdk.java.net Thu Nov 11 05:57:41 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 11 Nov 2021 05:57:41 GMT Subject: RFR: 8274982: Add a test for 8269574. In-Reply-To: References: Message-ID: <4jqAK0IYWkBy1sCEKvKWmGOqXB4OjhIPhybhTfE7Xxw=.1c9eb5e8-37e2-4bad-91f9-d5b72f6cdda6@github.com> On Mon, 11 Oct 2021 09:55:28 GMT, Evgeny Nikitin wrote: > This PR contains a relatively simple test which verifies that JVMTI-agents are correctly informed about exceptions caught in C2-compiled code. The 8269574 introduces pre-allocated exceptions in some paths, so the test tries to produce a number of various exceptions and check that provided small JVMTI agent got notified about all of them. test/hotspot/jtreg/compiler/jvmti/TriggerBuiltinExceptionsTest.java line 32: > 30: * > 31: * @build sun.hotspot.WhiteBox > 32: * @build compiler.jvmti.TriggerBuiltinExceptionsTest Explicit build directive should not be needed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5889 From ngasson at openjdk.java.net Thu Nov 11 06:02:33 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 11 Nov 2021 06:02:33 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: On Tue, 2 Nov 2021 00:16:44 GMT, TatWai Chong wrote: >> After JDK-8269559 was integrated there are failures in tier1 testing >> across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. >> >> This patch is NOT functional; rather, this tends to verify potential >> toolchain issues as the original patch pass testing on other >> platforms. >> >> In this patch, we remove new SVE-related matching rules and register >> class introduced in the original patch to minimally affect the >> non-SVE part. > > TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: > > Add the matching rule in td file, enable control path in the code stub. Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From dholmes at openjdk.java.net Thu Nov 11 06:09:36 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 11 Nov 2021 06:09:36 GMT Subject: RFR: 8274982: Add a test for 8269574. In-Reply-To: References: Message-ID: On Mon, 11 Oct 2021 09:55:28 GMT, Evgeny Nikitin wrote: > This PR contains a relatively simple test which verifies that JVMTI-agents are correctly informed about exceptions caught in C2-compiled code. The 8269574 introduces pre-allocated exceptions in some paths, so the test tries to produce a number of various exceptions and check that provided small JVMTI agent got notified about all of them. Just a couple of minor issues, not a review of functionality. test/hotspot/jtreg/compiler/jvmti/TriggerBuiltinExceptionsTest.java line 28: > 26: * @bug 8269574 > 27: * @summary Verifies that exceptions are reported correctly to JVMTI in the compiled code > 28: * @requires vm.jvmti You also require the JIT test/hotspot/jtreg/compiler/jvmti/TriggerBuiltinExceptionsTest.java line 59: > 57: public class TriggerBuiltinExceptionsTest { > 58: private static final WhiteBox WB = WhiteBox.getWhiteBox(); > 59: private static final int ITERATIONS = 30; //Arbitrary value, feel free to change Style nit: space after // ------------- PR: https://git.openjdk.java.net/jdk/pull/5889 From thartmann at openjdk.java.net Thu Nov 11 06:59:31 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Nov 2021 06:59:31 GMT Subject: RFR: 8276112: Inconsistent scalar replacement debug info at safepoints In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 12:12:39 GMT, Tobias Hartmann wrote: > [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) introduced aggressive scalar replacement of primitive boxes during incremental inlining if the box is only referenced by safepoint debug info: > https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/callGenerator.cpp#L678-L683 > > It works by replacing safepoint usages by `SafePointScalarObject` nodes and adjusting the JVMState accordingly. For example, in `TestSafepointDebugInfo::test1` the `helper` method in line 56 is inlined and the box result of `Integer.valueOf` is scalar replaced in the safepoint debug info: > > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) > JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > > > The `scalar` offset in the JVMState points to the integer field in the debug info. The problem is now that additional inlining can happen afterwards "on top of" this JVMState. In this case, the call to `Integer.valueOf` in line 57 is inlined, leading to the following JVMState for the inlined callee: > > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) > JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > JVMS depth=2 loc=5 stk=6 arg=8 mon=10 scalar=10 end=10 mondepth=0 sp=2 bci=3 reexecute=false method=static jobject java.lang.Integer.valueOf(jint) > ``` > > In this simple case, both caller and (inlined) callee state share the same `SafePointNode`. However, the `scalar` offset in the JVMState of the callee (depth 2) is not correct anymore and out of bounds. > > Parsing then emits an `unstable_if` trap and `GraphKit::add_safepoint_edges` merges above states to: > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 337 CallStaticJava === 332 1 7 8 1 ( 336 1 315 10 10 10 328 ) [[]] # Static uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') void ( int ) Integer::valueOf @ bci:3 (line 1075) reexecute TestSafepointDebugInfo::test1 @ bci:6 (line 57) !jvms: Integer::valueOf @ bci:3 (line 1075) TestSafepointDebugInfo::test1 @ bci:6 (line 57) > JVMS depth=1 loc=6 stk=8 arg=8 mon=8 scalar=8 end=9 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > JVMS depth=2 loc=9 stk=10 arg=12 mon=12 scalar=12 end=12 mondepth=0 sp=2 bci=3 reexecute=true method=static jobject java.lang.Integer.valueOf(jint) > > > We then crash in `PhaseOutput::FillLocArray` when processing the `SafePointScalarObject` and trying to access the corresponding field in the debug info at (out of bounds) offset 12: > https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/output.cpp#L833-L836 > > Even worse, in some scenarios we don't crash/assert but emit incorrect debug info leading to wrong results after deoptimization. For example, `TestSafepointDebugInfo::test2` fails because the `SafePointScalarObject` for `box1` and `box2` point to the same field in the debug info. This can happen if scalar replacement happens again "on top of" an already inconsistent JVMState. Afterwards, the out of bounds offset accidentally points to the field of the newly scalarized object. > > Originally, this issue only reproduced intermittently with a long running internal stress test but I was able to extract a set of simple regression tests that trigger different failure modes in the compiler or wrong execution. > > I think fixing this is complicated and I therefore propose to disable [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) for now and file an enhancement to fix and re-enable it later. We should then also add proper verification code and more complete tests. > > Thanks, > Tobias Thanks for the review, Vladimir. Yes, we did the same in Valhalla where we aggressively scalarize during IGVN. ------------- PR: https://git.openjdk.java.net/jdk/pull/6333 From thartmann at openjdk.java.net Thu Nov 11 07:18:32 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Nov 2021 07:18:32 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v7] In-Reply-To: References: Message-ID: <9BPOQ1JaFIvdliZ57njoaKH3iIedsIn1vut3LpfmvfY=.8ed66e4c-e9ad-4792-9e2b-93a218a4a448@github.com> On Thu, 11 Nov 2021 02:48:28 GMT, ?? wrote: >> `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. >> >> For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. >> image >> >> >> In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. >> image >> >> There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: >> >> diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp >> index 38b40a6..31ff172 100644 >> --- a/src/hotspot/share/opto/ifnode.cpp >> +++ b/src/hotspot/share/opto/ifnode.cpp >> @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { >> } >> } >> >> + if (is_LongCountedLoopEnd()) { >> + set_req(0, dom->in(0)); >> + set_req(1, dom->in(1)); >> + dom->set_req(0, pre); >> + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); >> + Node* proj0 = raw_out(0); >> + Node* proj1 = raw_out(1); >> + Node* dom_proj0 = dom->raw_out(0); >> + Node* dom_proj1 = dom->raw_out(1); >> + dom_proj0->set_req(0, this); >> + dom_proj1->set_req(0, this); >> + proj0->set_req(0, dom); >> + proj1->set_req(0, dom); >> + } >> + >> if (bol->outcnt() == 0) { >> igvn->remove_dead_node(bol); // Kill the BoolNode. >> } >> diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp >> index 6f7e34d..7955722 100644 >> --- a/src/hotspot/share/opto/loopnode.cpp >> +++ b/src/hotspot/share/opto/loopnode.cpp >> @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List >> Node* back_control = head->in(LoopNode::LoopBackControl); >> >> // data nodes on back branch not supported >> - if (back_control->outcnt() > 1) { >> + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { >> return false; >> } > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Add some comment The comment looks good and all tests passed. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6099 From dholmes at openjdk.java.net Thu Nov 11 07:27:35 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 11 Nov 2021 07:27:35 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v2] In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 19:19:13 GMT, Roman Kennke wrote: >> The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. >> I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. >> >> The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [ ] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Verify monitors even in non-debug builds How was, and going forward will, this be tested? There are no tests using UseHeavyMonitors. And a real test would be to run a bunch of other tests with the flag applied. ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From thartmann at openjdk.java.net Thu Nov 11 07:29:32 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Nov 2021 07:29:32 GMT Subject: RFR: 8275317: AArch64: Support some type conversion vectorization in SLP In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 03:39:42 GMT, Fei Gao wrote: > Current SLP vectorizer in C2 compiler doesn't support type conversion > operations. But AArch64 has vector type conversion instructions in > both NEON and SVE. > > The type conversion involves two kinds of scenarios, conversion between > the same data sizes and conversion between different data sizes. If we > want to support casts between different data sizes, we need to amend > the code part for identifying adjacent memory references and the code > part for justifying if the combination is profitable. I suppose it > would be easier to review if we split the whole task to support type > conversion into two separate patches, one for the same data sizes and > the other one for different data sizes. The goal of this patch is just > to support conversions within the same data size, including: > int -> float > float -> int > long -> double > double -> long > > A typical test case: > > for (int i = start; i < limit; i++) { > b[i] = (float) a[i]; > } > > To implement it, the patch completed the necessary instructions and > matching rules in the backend and added implementation for SLP in > the middle end. > > The percentage of performance uplift on aarch64 system: > Mode: avgt > Cnt: 15 > Metric: (ns/op) > > benchmark percentage change [(After-Before)/Before] > VectorLoop.convertD2L -48.46% > VectorLoop.convertF2I -55.67% > VectorLoop.convertI2F -55.27% > VectorLoop.convertL2D -48.75% Thanks for sharing these numbers. Your changes look good to me and our internal testing all passed. A second review would be good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6145 From chagedorn at openjdk.java.net Thu Nov 11 08:06:46 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 11 Nov 2021 08:06:46 GMT Subject: Integrated: 8276546: [IR Framework] Whitelist and ignore CompileThreshold In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 13:21:56 GMT, Christian Hagedorn wrote: > This patch whitelists `CompileThreshold` and ignores it if passed as JTreg VM/Java option flag to a test. The reason to do this is that our CI executes `-XX:-TieredCompilation` in combination with `CompileThreshold` and therefore IR matching will not be performed (because `CompileThreshold` is not whitelisted). This patch changes this. > > Setting `CompileThreshold` with `TestFramework::addFlags/runWithFlags()` will normally apply the flag. > > Thanks, > Christian This pull request has now been integrated. Changeset: 7a140af2 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/7a140af25362556ebe86147dcd74413e0044edc0 Stats: 109 lines in 3 files changed: 106 ins; 0 del; 3 mod 8276546: [IR Framework] Whitelist and ignore CompileThreshold Reviewed-by: kvn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/6312 From thartmann at openjdk.java.net Thu Nov 11 08:36:37 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Nov 2021 08:36:37 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v3] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 05:12:09 GMT, Mai ??ng Qu?n Anh wrote: >> This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. >> >> In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request incrementally with two additional commits since the last revision: > > - replace cmpx->Opcode() with cmpx_op > - address reviews, remove checks for subtraction operatios Given that there is `Integer/Long.compareUnsigned` using this idiom, it seems reasonable to optimize. Some general comments: - Your benchmark does not cover all the cases you are optimizing. Maybe you should also add the `Integer.compareUnsigned` variants. - You need a correctness test as well, ideally using the IR verification framework to also verify that the optimizations are actually performed. src/hotspot/share/opto/subnode.cpp line 1534: > 1532: } > 1533: > 1534: // Change x + Integer.MIN_VALUE <=> y + Integer.MIN_VALUE into x u<=> y The `<=>` in the comment is confusing because it usually denotes logical equality. Also, you are only handling `<` and `>` below. What about the other variants? Shouldn't they be canonicalized in `idealize_test` (see `ifnode.cpp`)? I would recommend making it explicit in the comment and use brackets for readability: // Change (x + Integer.MIN_VALUE < y + Integer.MIN_VALUE) into (x u< y) and // (x + Integer.MIN_VALUE > y + Integer.MIN_VALUE) into (x u> y). src/hotspot/share/opto/subnode.cpp line 1546: > 1544: } else if (cmp2_op == Op_AddI && > 1545: phase->type(cmp2->in(2)) == TypeInt::MIN) { > 1546: Node *ncmp = phase->transform(new CmpUNode(cmp1->in(1), cmp2->in(1))); `Node *ncmp` -> `Node* ncmp` ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6101 From manc at openjdk.java.net Thu Nov 11 08:38:37 2021 From: manc at openjdk.java.net (Man Cao) Date: Thu, 11 Nov 2021 08:38:37 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v5] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 02:50:52 GMT, Vladimir Kozlov wrote: > but to get only C1 times and more or less accurate time you need to run with -XX:TieredStopAtLevel=3 -XX:CICompilerCount=1 flags. I'm rerunning the benchmarks with this flag to only run C1. Will upload result after it finishes. > But it does allow strange things like pointer() to return invalid memory, so again, I think it's best not to allow that. Would it be sufficient if we add a null check in the assertion like this? LIR_OprPtr* pointer() const { assert(_value != 0 && is_pointer(), "nullness and type check"); return (LIR_OprPtr*)_value; } Another note that we need `LIR_Opr() : _value(0) {}` is that we need a default constructor for the ease of array initialization like `LIR_Opr FrameMap::_caller_save_cpu_regs[] = {};`. We probably don't want to use `LIR_Opr() : _value(-1) {}` as with the illegalOpr() approach, because it is a behavior change. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From manc at openjdk.java.net Thu Nov 11 08:38:38 2021 From: manc at openjdk.java.net (Man Cao) Date: Thu, 11 Nov 2021 08:38:38 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v6] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 04:33:57 GMT, Dean Long wrote: >> Man Cao has updated the pull request incrementally with two additional commits since the last revision: >> >> - Use nullOpr() or {} instead of LIR_Opr() >> - Revert the renaming from LIR_OprDesc to LIR_Opr to minimize patch size > > src/hotspot/cpu/arm/c1_LIRGenerator_arm.cpp line 367: > >> 365: >> 366: void LIRGenerator::CardTableBarrierSet_post_barrier_helper(LIR_Opr addr, LIR_Const* card_table_base) { >> 367: assert(addr->is_register(), "must be a register at this point"); > > The above rename seems unnecessary. It is actually necessary. Due to `typedef LIR_Opr LIR_OprDesc;`, `LIR_OprDesc*` is equivalent to `LIR_Opr*`, which breaks build. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From thartmann at openjdk.java.net Thu Nov 11 09:23:31 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Nov 2021 09:23:31 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: On Thu, 11 Nov 2021 02:19:41 GMT, TatWai Chong wrote: > They are almost identical except the comparison in the stub code has been changed from "if (!UseSVE)" to "if (UseSVE == 0)". It doesn't mean that is the cause of these failures. The generated codes from Toolchains are different from patch to patch. They should be equivalent but as I'm not familiar macOS toolchain, this change can avoid unnecessary casting from `uint` to `bool`. Please see [UseSVE](https://github.com/openjdk/jdk/blob/673ce7efa56e7eb54266af6fe795d46d57f51bdc/src/hotspot/cpu/aarch64/globals_aarch64.hpp#L101) I undid that change and re-executed testing. No failures. Integrating this again without knowing the root cause of the previous failures is a bit concerning. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From aph at openjdk.java.net Thu Nov 11 09:23:31 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 11 Nov 2021 09:23:31 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: On Thu, 11 Nov 2021 09:18:04 GMT, Tobias Hartmann wrote: > > They are almost identical except the comparison in the stub code has been changed from "if (!UseSVE)" to "if (UseSVE == 0)". It doesn't mean that is the cause of these failures. The generated codes from Toolchains are different from patch to patch. They should be equivalent but as I'm not familiar macOS toolchain, this change can avoid unnecessary casting from `uint` to `bool`. Please see [UseSVE](https://github.com/openjdk/jdk/blob/673ce7efa56e7eb54266af6fe795d46d57f51bdc/src/hotspot/cpu/aarch64/globals_aarch64.hpp#L101) > > I undid that change and re-executed testing. No failures. Integrating this again without knowing the root cause of the previous failures is a bit concerning. True, but there never was much chance that this change directly caused the failures. My money is still on a toolchain problem. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From simonis at openjdk.java.net Thu Nov 11 09:48:35 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 09:48:35 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: References: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> Message-ID: On Wed, 10 Nov 2021 16:56:07 GMT, Martin Doerr wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. > > src/hotspot/share/prims/whitebox.cpp line 987: > >> 985: bool overflow = false; >> 986: for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { >> 987: if (reason_str != NULL && !strcmp(reason_str, Deoptimization::trap_reason_name(reason))) { > > Maybe the code would be better readable when checking `reason_str != NULL` first and then use 2 loops? Just a minor suggestion. Should only be done if readability is better. I've tried it but the resulting version is slightly longer and in my opinion not really more readable: WB_ENTRY(jint, WB_GetMethodTrapCount(JNIEnv* env, jobject o, jobject method, jstring reason_obj)) jmethodID jmid = reflected_method_to_jmid(thread, env, method); CHECK_JNI_EXCEPTION_(env, 0); methodHandle mh(THREAD, Method::checked_resolve_jmethod_id(jmid)); uint cnt = 0; MethodData* mdo = mh->method_data(); if (mdo != NULL) { ResourceMark rm(THREAD); if (reason_obj != NULL) { char* reason_str = java_lang_String::as_utf8_string(JNIHandles::resolve_non_null(reason_obj)); for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { if (!strcmp(reason_str, Deoptimization::trap_reason_name(reason))) { cnt = mdo->trap_count(reason); // Count in the overflow trap count on overflow if (cnt == (uint)-1) { cnt = mdo->trap_count_limit() + mdo->overflow_trap_count(); } break; } } } else { bool overflow = false; for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { uint c = mdo->trap_count(reason); if (c == (uint)-1) { c = mdo->trap_count_limit(); if (!overflow) { // Count overflow trap count just once overflow = true; c += mdo->overflow_trap_count(); } } cnt += c; } } } return cnt; WB_END But for me it's actually no difference. Please just let me know if you'd still prefer the alternative version. PS: I've updated the documentation of the method which was inaccurate for `reason==NULL`. > src/hotspot/share/prims/whitebox.cpp line 1016: > >> 1014: } >> 1015: ResourceMark rm(THREAD); >> 1016: char *reason_str = (reason_obj == NULL) ? > > I think we should use `const char*` as far as possible. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Nov 11 09:54:38 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 09:54:38 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: References: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> Message-ID: On Wed, 10 Nov 2021 16:57:14 GMT, Martin Doerr wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. > > src/hotspot/share/runtime/deoptimization.cpp line 2695: > >> 2693: return 0; >> 2694: } >> 2695: > > Why do we need this? Is it a placeholder for a future enhancement? If so, a comment would at least be helpful. That's a tricky one :) It's needed to fix the Minimal/Zero builds. It's inside a the `#else` branch of a `#ifdef COMPILER2_OR_JVMCI` condition together with a bunch of other methods which have an empty body in the case we have no C2 or JVMCI. Could certainly be implemented more elegant but I decided to adhere to the current coding style in `deoptimization.cpp` :) ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Nov 11 10:00:39 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 10:00:39 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: References: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> Message-ID: On Wed, 10 Nov 2021 17:06:06 GMT, Martin Doerr wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. > > test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 78: > >> 76: private static final WhiteBox WB = WhiteBox.getWhiteBox(); >> 77: // Until JDK-8275908 is not fixed, null-pointer traps for invokes and array-store traps are not profiled in the interpreter. >> 78: private static final boolean JDK8275908_fixed = false; > > I don't know if that one should get fixed first, but I'm ok with your workaround. Would it make sense to add that bug id to this test's header? This PR is now open for so long time and I'd like to complete it without the dependency on another fix. But adding the bug id to the test is a good idea. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From ngasson at openjdk.java.net Thu Nov 11 10:06:34 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 11 Nov 2021 10:06:34 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: <3i8MMjyxigQIcZO2irH8ai-nierm7os41AAi0auvVBo=.237b6f15-63cd-4473-abf9-5e2e2876e989@github.com> On Thu, 11 Nov 2021 09:20:05 GMT, Andrew Haley wrote: > > I undid that change and re-executed testing. No failures. Integrating this again without knowing the root cause of the previous failures is a bit concerning. Is it possible to re-test the exact commit that caused the problem? 8b1b6f9fb ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From simonis at openjdk.java.net Thu Nov 11 10:28:04 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 10:28:04 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v7] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Minor enhancements and fixes requested by Martin ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5488/files - new: https://git.openjdk.java.net/jdk/pull/5488/files/99db7e54..625da2f9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=05-06 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Nov 11 10:28:08 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 10:28:08 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> References: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> Message-ID: On Thu, 4 Nov 2021 16:28:52 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. Hi Martin, thanks a lot for looking at my PR one more time. I've just pushed an updated version which should address all your points. Still anything missing? Best regards, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From thartmann at openjdk.java.net Thu Nov 11 10:39:38 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Nov 2021 10:39:38 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: On Tue, 2 Nov 2021 00:16:44 GMT, TatWai Chong wrote: >> After JDK-8269559 was integrated there are failures in tier1 testing >> across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. >> >> This patch is NOT functional; rather, this tends to verify potential >> toolchain issues as the original patch pass testing on other >> platforms. >> >> In this patch, we remove new SVE-related matching rules and register >> class introduced in the original patch to minimally affect the >> non-SVE part. > > TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: > > Add the matching rule in td file, enable control path in the code stub. > Is it possible to re-test the exact commit that caused the problem? [8b1b6f9](https://github.com/openjdk/jdk/commit/8b1b6f9fb375bbc2de339ad8f526ca4d5f83dc70) Yes, I just did that and the failures reproduce immediately. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From duke at openjdk.java.net Thu Nov 11 10:41:41 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Thu, 11 Nov 2021 10:41:41 GMT Subject: Integrated: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt In-Reply-To: References: Message-ID: <1LxR0wJawc6JtSnbgOKq8LADYftiLRuWM4VxR5uRKE0=.862bf844-b8bf-4fb7-bf1e-916db323d286@github.com> On Mon, 25 Oct 2021 08:08:48 GMT, ?? wrote: > `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. > > For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. > image > > > In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. > image > > There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: > > diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp > index 38b40a6..31ff172 100644 > --- a/src/hotspot/share/opto/ifnode.cpp > +++ b/src/hotspot/share/opto/ifnode.cpp > @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { > } > } > > + if (is_LongCountedLoopEnd()) { > + set_req(0, dom->in(0)); > + set_req(1, dom->in(1)); > + dom->set_req(0, pre); > + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); > + Node* proj0 = raw_out(0); > + Node* proj1 = raw_out(1); > + Node* dom_proj0 = dom->raw_out(0); > + Node* dom_proj1 = dom->raw_out(1); > + dom_proj0->set_req(0, this); > + dom_proj1->set_req(0, this); > + proj0->set_req(0, dom); > + proj1->set_req(0, dom); > + } > + > if (bol->outcnt() == 0) { > igvn->remove_dead_node(bol); // Kill the BoolNode. > } > diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp > index 6f7e34d..7955722 100644 > --- a/src/hotspot/share/opto/loopnode.cpp > +++ b/src/hotspot/share/opto/loopnode.cpp > @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List > Node* back_control = head->in(LoopNode::LoopBackControl); > > // data nodes on back branch not supported > - if (back_control->outcnt() > 1) { > + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { > return false; > } This pull request has now been integrated. Changeset: aea09677 Author: casparcwang Committer: Hui Shi URL: https://git.openjdk.java.net/jdk/commit/aea096770e74b9c0e1556467705ffdd6cf843d9d Stats: 103 lines in 2 files changed: 103 ins; 0 del; 0 mod 8275854: C2: assert(stride_con != 0) failed: missed some peephole opt Co-authored-by: Roland Westrelin Reviewed-by: thartmann, roland, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6099 From mdoerr at openjdk.java.net Thu Nov 11 10:54:38 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 11 Nov 2021 10:54:38 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v7] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 10:28:04 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor enhancements and fixes requested by Martin Thanks for the updates. LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5488 From mdoerr at openjdk.java.net Thu Nov 11 10:54:38 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 11 Nov 2021 10:54:38 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: References: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> Message-ID: <1EDY97O7mQZB96nsPoxILTsIaRRoiVmKWkOzq-2ANd8=.2498976c-b5bf-4391-b055-2acb46d99a79@github.com> On Thu, 11 Nov 2021 09:40:44 GMT, Volker Simonis wrote: >> src/hotspot/share/prims/whitebox.cpp line 987: >> >>> 985: bool overflow = false; >>> 986: for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { >>> 987: if (reason_str != NULL && !strcmp(reason_str, Deoptimization::trap_reason_name(reason))) { >> >> Maybe the code would be better readable when checking `reason_str != NULL` first and then use 2 loops? Just a minor suggestion. Should only be done if readability is better. > > I've tried it but the resulting version is slightly longer and in my opinion not really more readable: > > WB_ENTRY(jint, WB_GetMethodTrapCount(JNIEnv* env, jobject o, jobject method, jstring reason_obj)) > jmethodID jmid = reflected_method_to_jmid(thread, env, method); > CHECK_JNI_EXCEPTION_(env, 0); > methodHandle mh(THREAD, Method::checked_resolve_jmethod_id(jmid)); > uint cnt = 0; > MethodData* mdo = mh->method_data(); > if (mdo != NULL) { > ResourceMark rm(THREAD); > if (reason_obj != NULL) { > char* reason_str = java_lang_String::as_utf8_string(JNIHandles::resolve_non_null(reason_obj)); > for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { > if (!strcmp(reason_str, Deoptimization::trap_reason_name(reason))) { > cnt = mdo->trap_count(reason); > // Count in the overflow trap count on overflow > if (cnt == (uint)-1) { > cnt = mdo->trap_count_limit() + mdo->overflow_trap_count(); > } > break; > } > } > } else { > bool overflow = false; > for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { > uint c = mdo->trap_count(reason); > if (c == (uint)-1) { > c = mdo->trap_count_limit(); > if (!overflow) { > // Count overflow trap count just once > overflow = true; > c += mdo->overflow_trap_count(); > } > } > cnt += c; > } > } > } > return cnt; > WB_END > > > But for me it's actually no difference. Please just let me know if you'd still prefer the alternative version. > > PS: I've updated the documentation of the method which was inaccurate for `reason==NULL`. Your two loop version looks a bit easier to read for me, but that may be a matter of taste. I leave you free to decide. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From chagedorn at openjdk.java.net Thu Nov 11 10:56:41 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 11 Nov 2021 10:56:41 GMT Subject: RFR: 8276112: Inconsistent scalar replacement debug info at safepoints In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 12:12:39 GMT, Tobias Hartmann wrote: > [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) introduced aggressive scalar replacement of primitive boxes during incremental inlining if the box is only referenced by safepoint debug info: > https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/callGenerator.cpp#L678-L683 > > It works by replacing safepoint usages by `SafePointScalarObject` nodes and adjusting the JVMState accordingly. For example, in `TestSafepointDebugInfo::test1` the `helper` method in line 56 is inlined and the box result of `Integer.valueOf` is scalar replaced in the safepoint debug info: > > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) > JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > > > The `scalar` offset in the JVMState points to the integer field in the debug info. The problem is now that additional inlining can happen afterwards "on top of" this JVMState. In this case, the call to `Integer.valueOf` in line 57 is inlined, leading to the following JVMState for the inlined callee: > > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) > JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > JVMS depth=2 loc=5 stk=6 arg=8 mon=10 scalar=10 end=10 mondepth=0 sp=2 bci=3 reexecute=false method=static jobject java.lang.Integer.valueOf(jint) > ``` > > In this simple case, both caller and (inlined) callee state share the same `SafePointNode`. However, the `scalar` offset in the JVMState of the callee (depth 2) is not correct anymore and out of bounds. > > Parsing then emits an `unstable_if` trap and `GraphKit::add_safepoint_edges` merges above states to: > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 337 CallStaticJava === 332 1 7 8 1 ( 336 1 315 10 10 10 328 ) [[]] # Static uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') void ( int ) Integer::valueOf @ bci:3 (line 1075) reexecute TestSafepointDebugInfo::test1 @ bci:6 (line 57) !jvms: Integer::valueOf @ bci:3 (line 1075) TestSafepointDebugInfo::test1 @ bci:6 (line 57) > JVMS depth=1 loc=6 stk=8 arg=8 mon=8 scalar=8 end=9 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > JVMS depth=2 loc=9 stk=10 arg=12 mon=12 scalar=12 end=12 mondepth=0 sp=2 bci=3 reexecute=true method=static jobject java.lang.Integer.valueOf(jint) > > > We then crash in `PhaseOutput::FillLocArray` when processing the `SafePointScalarObject` and trying to access the corresponding field in the debug info at (out of bounds) offset 12: > https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/output.cpp#L833-L836 > > Even worse, in some scenarios we don't crash/assert but emit incorrect debug info leading to wrong results after deoptimization. For example, `TestSafepointDebugInfo::test2` fails because the `SafePointScalarObject` for `box1` and `box2` point to the same field in the debug info. This can happen if scalar replacement happens again "on top of" an already inconsistent JVMState. Afterwards, the out of bounds offset accidentally points to the field of the newly scalarized object. > > Originally, this issue only reproduced intermittently with a long running internal stress test but I was able to extract a set of simple regression tests that trigger different failure modes in the compiler or wrong execution. > > I think fixing this is complicated and I therefore propose to disable [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) for now and file an enhancement to fix and re-enable it later. We should then also add proper verification code and more complete tests. > > Thanks, > Tobias Nice analysis and test cases! Otherwise, the fix to disable JDK-8261137 and redo it in an RFE looks reasonable! test/hotspot/jtreg/compiler/eliminateAutobox/TestSafepointDebugInfo.java line 32: > 30: * @library /test/lib > 31: * @run main/othervm -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:+AlwaysIncrementalInline > 32: * -XX:CompileCommand=inline,java.lang.Integer.valueOf::valueOf Should probably be `-XX:CompileCommand=inline,java.lang.Integer::valueOf` instead. ------------- Changes requested by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6333 From aph at openjdk.java.net Thu Nov 11 10:59:02 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 11 Nov 2021 10:59:02 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v2] In-Reply-To: References: Message-ID: > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Cleanup, constify. - Cleanup, constify. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6280/files - new: https://git.openjdk.java.net/jdk/pull/6280/files/99f1a065..66f11b87 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=00-01 Stats: 91 lines in 4 files changed: 35 ins; 27 del; 29 mod Patch: https://git.openjdk.java.net/jdk/pull/6280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6280/head:pull/6280 PR: https://git.openjdk.java.net/jdk/pull/6280 From thartmann at openjdk.java.net Thu Nov 11 11:02:20 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Nov 2021 11:02:20 GMT Subject: RFR: 8276112: Inconsistent scalar replacement debug info at safepoints [v2] In-Reply-To: References: Message-ID: > [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) introduced aggressive scalar replacement of primitive boxes during incremental inlining if the box is only referenced by safepoint debug info: > https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/callGenerator.cpp#L678-L683 > > It works by replacing safepoint usages by `SafePointScalarObject` nodes and adjusting the JVMState accordingly. For example, in `TestSafepointDebugInfo::test1` the `helper` method in line 56 is inlined and the box result of `Integer.valueOf` is scalar replaced in the safepoint debug info: > > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) > JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > > > The `scalar` offset in the JVMState points to the integer field in the debug info. The problem is now that additional inlining can happen afterwards "on top of" this JVMState. In this case, the call to `Integer.valueOf` in line 57 is inlined, leading to the following JVMState for the inlined callee: > > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) > JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > JVMS depth=2 loc=5 stk=6 arg=8 mon=10 scalar=10 end=10 mondepth=0 sp=2 bci=3 reexecute=false method=static jobject java.lang.Integer.valueOf(jint) > ``` > > In this simple case, both caller and (inlined) callee state share the same `SafePointNode`. However, the `scalar` offset in the JVMState of the callee (depth 2) is not correct anymore and out of bounds. > > Parsing then emits an `unstable_if` trap and `GraphKit::add_safepoint_edges` merges above states to: > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 337 CallStaticJava === 332 1 7 8 1 ( 336 1 315 10 10 10 328 ) [[]] # Static uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') void ( int ) Integer::valueOf @ bci:3 (line 1075) reexecute TestSafepointDebugInfo::test1 @ bci:6 (line 57) !jvms: Integer::valueOf @ bci:3 (line 1075) TestSafepointDebugInfo::test1 @ bci:6 (line 57) > JVMS depth=1 loc=6 stk=8 arg=8 mon=8 scalar=8 end=9 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > JVMS depth=2 loc=9 stk=10 arg=12 mon=12 scalar=12 end=12 mondepth=0 sp=2 bci=3 reexecute=true method=static jobject java.lang.Integer.valueOf(jint) > > > We then crash in `PhaseOutput::FillLocArray` when processing the `SafePointScalarObject` and trying to access the corresponding field in the debug info at (out of bounds) offset 12: > https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/output.cpp#L833-L836 > > Even worse, in some scenarios we don't crash/assert but emit incorrect debug info leading to wrong results after deoptimization. For example, `TestSafepointDebugInfo::test2` fails because the `SafePointScalarObject` for `box1` and `box2` point to the same field in the debug info. This can happen if scalar replacement happens again "on top of" an already inconsistent JVMState. Afterwards, the out of bounds offset accidentally points to the field of the newly scalarized object. > > Originally, this issue only reproduced intermittently with a long running internal stress test but I was able to extract a set of simple regression tests that trigger different failure modes in the compiler or wrong execution. > > I think fixing this is complicated and I therefore propose to disable [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) for now and file an enhancement to fix and re-enable it later. We should then also add proper verification code and more complete tests. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Removed unnecessary inline compile command ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6333/files - new: https://git.openjdk.java.net/jdk/pull/6333/files/abec0b43..45a1374e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6333&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6333&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6333.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6333/head:pull/6333 PR: https://git.openjdk.java.net/jdk/pull/6333 From chagedorn at openjdk.java.net Thu Nov 11 11:02:23 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 11 Nov 2021 11:02:23 GMT Subject: RFR: 8276112: Inconsistent scalar replacement debug info at safepoints [v2] In-Reply-To: References: Message-ID: <09uNjGrxn1RamKbfG3CiU-_--EQZBulQ9vHmBpP31fU=.355ff28d-6b4b-40e7-a78f-e54dfee8347e@github.com> On Thu, 11 Nov 2021 10:59:06 GMT, Tobias Hartmann wrote: >> [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) introduced aggressive scalar replacement of primitive boxes during incremental inlining if the box is only referenced by safepoint debug info: >> https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/callGenerator.cpp#L678-L683 >> >> It works by replacing safepoint usages by `SafePointScalarObject` nodes and adjusting the JVMState accordingly. For example, in `TestSafepointDebugInfo::test1` the `helper` method in line 56 is inlined and the box result of `Integer.valueOf` is scalar replaced in the safepoint debug info: >> >> >> 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: >> 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) >> JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) >> >> >> The `scalar` offset in the JVMState points to the integer field in the debug info. The problem is now that additional inlining can happen afterwards "on top of" this JVMState. In this case, the call to `Integer.valueOf` in line 57 is inlined, leading to the following JVMState for the inlined callee: >> >> >> 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: >> 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) >> JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) >> JVMS depth=2 loc=5 stk=6 arg=8 mon=10 scalar=10 end=10 mondepth=0 sp=2 bci=3 reexecute=false method=static jobject java.lang.Integer.valueOf(jint) >> ``` >> >> In this simple case, both caller and (inlined) callee state share the same `SafePointNode`. However, the `scalar` offset in the JVMState of the callee (depth 2) is not correct anymore and out of bounds. >> >> Parsing then emits an `unstable_if` trap and `GraphKit::add_safepoint_edges` merges above states to: >> >> 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: >> 337 CallStaticJava === 332 1 7 8 1 ( 336 1 315 10 10 10 328 ) [[]] # Static uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') void ( int ) Integer::valueOf @ bci:3 (line 1075) reexecute TestSafepointDebugInfo::test1 @ bci:6 (line 57) !jvms: Integer::valueOf @ bci:3 (line 1075) TestSafepointDebugInfo::test1 @ bci:6 (line 57) >> JVMS depth=1 loc=6 stk=8 arg=8 mon=8 scalar=8 end=9 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) >> JVMS depth=2 loc=9 stk=10 arg=12 mon=12 scalar=12 end=12 mondepth=0 sp=2 bci=3 reexecute=true method=static jobject java.lang.Integer.valueOf(jint) >> >> >> We then crash in `PhaseOutput::FillLocArray` when processing the `SafePointScalarObject` and trying to access the corresponding field in the debug info at (out of bounds) offset 12: >> https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/output.cpp#L833-L836 >> >> Even worse, in some scenarios we don't crash/assert but emit incorrect debug info leading to wrong results after deoptimization. For example, `TestSafepointDebugInfo::test2` fails because the `SafePointScalarObject` for `box1` and `box2` point to the same field in the debug info. This can happen if scalar replacement happens again "on top of" an already inconsistent JVMState. Afterwards, the out of bounds offset accidentally points to the field of the newly scalarized object. >> >> Originally, this issue only reproduced intermittently with a long running internal stress test but I was able to extract a set of simple regression tests that trigger different failure modes in the compiler or wrong execution. >> >> I think fixing this is complicated and I therefore propose to disable [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) for now and file an enhancement to fix and re-enable it later. We should then also add proper verification code and more complete tests. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed unnecessary inline compile command That looks good to me! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6333 From thartmann at openjdk.java.net Thu Nov 11 11:02:25 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Nov 2021 11:02:25 GMT Subject: RFR: 8276112: Inconsistent scalar replacement debug info at safepoints In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 12:12:39 GMT, Tobias Hartmann wrote: > [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) introduced aggressive scalar replacement of primitive boxes during incremental inlining if the box is only referenced by safepoint debug info: > https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/callGenerator.cpp#L678-L683 > > It works by replacing safepoint usages by `SafePointScalarObject` nodes and adjusting the JVMState accordingly. For example, in `TestSafepointDebugInfo::test1` the `helper` method in line 56 is inlined and the box result of `Integer.valueOf` is scalar replaced in the safepoint debug info: > > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) > JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > > > The `scalar` offset in the JVMState points to the integer field in the debug info. The problem is now that additional inlining can happen afterwards "on top of" this JVMState. In this case, the call to `Integer.valueOf` in line 57 is inlined, leading to the following JVMState for the inlined callee: > > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) > JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > JVMS depth=2 loc=5 stk=6 arg=8 mon=10 scalar=10 end=10 mondepth=0 sp=2 bci=3 reexecute=false method=static jobject java.lang.Integer.valueOf(jint) > ``` > > In this simple case, both caller and (inlined) callee state share the same `SafePointNode`. However, the `scalar` offset in the JVMState of the callee (depth 2) is not correct anymore and out of bounds. > > Parsing then emits an `unstable_if` trap and `GraphKit::add_safepoint_edges` merges above states to: > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 337 CallStaticJava === 332 1 7 8 1 ( 336 1 315 10 10 10 328 ) [[]] # Static uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') void ( int ) Integer::valueOf @ bci:3 (line 1075) reexecute TestSafepointDebugInfo::test1 @ bci:6 (line 57) !jvms: Integer::valueOf @ bci:3 (line 1075) TestSafepointDebugInfo::test1 @ bci:6 (line 57) > JVMS depth=1 loc=6 stk=8 arg=8 mon=8 scalar=8 end=9 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > JVMS depth=2 loc=9 stk=10 arg=12 mon=12 scalar=12 end=12 mondepth=0 sp=2 bci=3 reexecute=true method=static jobject java.lang.Integer.valueOf(jint) > > > We then crash in `PhaseOutput::FillLocArray` when processing the `SafePointScalarObject` and trying to access the corresponding field in the debug info at (out of bounds) offset 12: > https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/output.cpp#L833-L836 > > Even worse, in some scenarios we don't crash/assert but emit incorrect debug info leading to wrong results after deoptimization. For example, `TestSafepointDebugInfo::test2` fails because the `SafePointScalarObject` for `box1` and `box2` point to the same field in the debug info. This can happen if scalar replacement happens again "on top of" an already inconsistent JVMState. Afterwards, the out of bounds offset accidentally points to the field of the newly scalarized object. > > Originally, this issue only reproduced intermittently with a long running internal stress test but I was able to extract a set of simple regression tests that trigger different failure modes in the compiler or wrong execution. > > I think fixing this is complicated and I therefore propose to disable [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) for now and file an enhancement to fix and re-enable it later. We should then also add proper verification code and more complete tests. > > Thanks, > Tobias Thanks for the review, Christian! Good catch, I removed the inline compile command because it is not required. ------------- PR: https://git.openjdk.java.net/jdk/pull/6333 From aph at openjdk.java.net Thu Nov 11 11:44:38 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 11 Nov 2021 11:44:38 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v2] In-Reply-To: References: Message-ID: On Tue, 9 Nov 2021 12:52:20 GMT, Jorn Vernee wrote: >> Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: >> >> - Cleanup, constify. >> - Cleanup, constify. > > src/hotspot/cpu/aarch64/register_aarch64.hpp line 44: > >> 42: extern name all_ ## type ## s[name::number_of_declared_registers] INTERNAL_VISIBILITY; \ >> 43: constexpr type first_ ## type = all_ ## type ## s; \ >> 44: inline constexpr type name::first() { return all_ ## type ## s; } > > Same here: > > Suggestion: > > // Macros to help define all kinds of registers > > #define REGISTER_IMPL_DECLARATION(type, impl_type) \ > inline const type as_ ## type(int encoding) { \ > assert(encoding <= impl_type::number_of_declared_registers, "invalid register"); \ > return encoding == -1 ? impl_type::invalid() : impl_type::first() + encoding; \ > } \ > extern impl_type all_ ## type ## s[impl_type::number_of_declared_registers] INTERNAL_VISIBILITY; \ > constexpr type first_ ## type = all_ ## type ## s; \ > inline constexpr type impl_type::first() { return all_ ## type ## s; } Done. > src/hotspot/cpu/aarch64/register_aarch64.hpp line 47: > >> 45: >> 46: #define REGISTER_IMPL_DEFINITION(type, name) \ >> 47: name all_ ## type ## s[name::number_of_declared_registers]; > > The use of the macro parameters `type` and `name` here, is a bit confusing since they mean something else in the `CONSTANT_REGISTER_DECLARATION` macro below. > > I'd suggest changing the parameter names to `type` and `impl_type` instead, to reflect that they are `` and `Impl` > > Suggestion: > > #define REGISTER_IMPL_DEFINITION(type, impl_type) \ > impl_type all_ ## type ## s[impl_type::number_of_declared_registers]; Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Thu Nov 11 11:56:38 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 11 Nov 2021 11:56:38 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v2] In-Reply-To: References: Message-ID: <48rR2UX-q6PpAXWr8Ip5SRROTHArRc_aChBA3jtn19Y=.0270e872-05d6-4f48-8a24-62867b1b6eb7@github.com> On Tue, 9 Nov 2021 13:26:26 GMT, Jorn Vernee wrote: >> Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: >> >> - Cleanup, constify. >> - Cleanup, constify. > > src/hotspot/cpu/aarch64/register_aarch64.hpp line 179: > >> 177: // accessors >> 178: bool is_valid() const { return this < invalid(); } >> 179: bool has_byte_register() const { return is_valid(); } > > Is `has_byte_register` needed here? I see it in the previous code for `Register` but not for `FloatRegister`. > Suggestion: Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From thartmann at openjdk.java.net Thu Nov 11 13:11:41 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Nov 2021 13:11:41 GMT Subject: Integrated: 8276112: Inconsistent scalar replacement debug info at safepoints In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 12:12:39 GMT, Tobias Hartmann wrote: > [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) introduced aggressive scalar replacement of primitive boxes during incremental inlining if the box is only referenced by safepoint debug info: > https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/callGenerator.cpp#L678-L683 > > It works by replacing safepoint usages by `SafePointScalarObject` nodes and adjusting the JVMState accordingly. For example, in `TestSafepointDebugInfo::test1` the `helper` method in line 56 is inlined and the box result of `Integer.valueOf` is scalar replaced in the safepoint debug info: > > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) > JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > > > The `scalar` offset in the JVMState points to the integer field in the debug info. The problem is now that additional inlining can happen afterwards "on top of" this JVMState. In this case, the call to `Integer.valueOf` in line 57 is inlined, leading to the following JVMState for the inlined callee: > > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) > JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > JVMS depth=2 loc=5 stk=6 arg=8 mon=10 scalar=10 end=10 mondepth=0 sp=2 bci=3 reexecute=false method=static jobject java.lang.Integer.valueOf(jint) > ``` > > In this simple case, both caller and (inlined) callee state share the same `SafePointNode`. However, the `scalar` offset in the JVMState of the callee (depth 2) is not correct anymore and out of bounds. > > Parsing then emits an `unstable_if` trap and `GraphKit::add_safepoint_edges` merges above states to: > > 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: > 337 CallStaticJava === 332 1 7 8 1 ( 336 1 315 10 10 10 328 ) [[]] # Static uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') void ( int ) Integer::valueOf @ bci:3 (line 1075) reexecute TestSafepointDebugInfo::test1 @ bci:6 (line 57) !jvms: Integer::valueOf @ bci:3 (line 1075) TestSafepointDebugInfo::test1 @ bci:6 (line 57) > JVMS depth=1 loc=6 stk=8 arg=8 mon=8 scalar=8 end=9 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) > JVMS depth=2 loc=9 stk=10 arg=12 mon=12 scalar=12 end=12 mondepth=0 sp=2 bci=3 reexecute=true method=static jobject java.lang.Integer.valueOf(jint) > > > We then crash in `PhaseOutput::FillLocArray` when processing the `SafePointScalarObject` and trying to access the corresponding field in the debug info at (out of bounds) offset 12: > https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/output.cpp#L833-L836 > > Even worse, in some scenarios we don't crash/assert but emit incorrect debug info leading to wrong results after deoptimization. For example, `TestSafepointDebugInfo::test2` fails because the `SafePointScalarObject` for `box1` and `box2` point to the same field in the debug info. This can happen if scalar replacement happens again "on top of" an already inconsistent JVMState. Afterwards, the out of bounds offset accidentally points to the field of the newly scalarized object. > > Originally, this issue only reproduced intermittently with a long running internal stress test but I was able to extract a set of simple regression tests that trigger different failure modes in the compiler or wrong execution. > > I think fixing this is complicated and I therefore propose to disable [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) for now and file an enhancement to fix and re-enable it later. We should then also add proper verification code and more complete tests. > > Thanks, > Tobias This pull request has now been integrated. Changeset: c29cab8a Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/c29cab8ab475055e02e4300f212907ff2db955ab Stats: 191 lines in 4 files changed: 175 ins; 1 del; 15 mod 8276112: Inconsistent scalar replacement debug info at safepoints Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/6333 From thartmann at openjdk.java.net Thu Nov 11 13:11:39 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Nov 2021 13:11:39 GMT Subject: RFR: 8276112: Inconsistent scalar replacement debug info at safepoints [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 11:02:20 GMT, Tobias Hartmann wrote: >> [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) introduced aggressive scalar replacement of primitive boxes during incremental inlining if the box is only referenced by safepoint debug info: >> https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/callGenerator.cpp#L678-L683 >> >> It works by replacing safepoint usages by `SafePointScalarObject` nodes and adjusting the JVMState accordingly. For example, in `TestSafepointDebugInfo::test1` the `helper` method in line 56 is inlined and the box result of `Integer.valueOf` is scalar replaced in the safepoint debug info: >> >> >> 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: >> 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) >> JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) >> >> >> The `scalar` offset in the JVMState points to the integer field in the debug info. The problem is now that additional inlining can happen afterwards "on top of" this JVMState. In this case, the call to `Integer.valueOf` in line 57 is inlined, leading to the following JVMState for the inlined callee: >> >> >> 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: >> 316 SafePoint === 5 6 317 8 1 1 1 315 10 1 10 [[]] SafePoint !jvms: TestSafepointDebugInfo::test1 @ bci:-1 (line 56) >> JVMS depth=1 loc=6 stk=8 arg=8 mon=10 scalar=10 end=11 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) >> JVMS depth=2 loc=5 stk=6 arg=8 mon=10 scalar=10 end=10 mondepth=0 sp=2 bci=3 reexecute=false method=static jobject java.lang.Integer.valueOf(jint) >> ``` >> >> In this simple case, both caller and (inlined) callee state share the same `SafePointNode`. However, the `scalar` offset in the JVMState of the callee (depth 2) is not correct anymore and out of bounds. >> >> Parsing then emits an `unstable_if` trap and `GraphKit::add_safepoint_edges` merges above states to: >> >> 315 SafePointScalarObject === 0 [[ 40 316 319 337 ]] # fields@[0..0] Oop:java/lang/Integer:NotNull:exact * !jvms: >> 337 CallStaticJava === 332 1 7 8 1 ( 336 1 315 10 10 10 328 ) [[]] # Static uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') void ( int ) Integer::valueOf @ bci:3 (line 1075) reexecute TestSafepointDebugInfo::test1 @ bci:6 (line 57) !jvms: Integer::valueOf @ bci:3 (line 1075) TestSafepointDebugInfo::test1 @ bci:6 (line 57) >> JVMS depth=1 loc=6 stk=8 arg=8 mon=8 scalar=8 end=9 mondepth=0 sp=0 bci=6 reexecute=false method=static jint compiler.eliminateAutobox.TestSafepointDebugInfo.test1(jint) >> JVMS depth=2 loc=9 stk=10 arg=12 mon=12 scalar=12 end=12 mondepth=0 sp=2 bci=3 reexecute=true method=static jobject java.lang.Integer.valueOf(jint) >> >> >> We then crash in `PhaseOutput::FillLocArray` when processing the `SafePointScalarObject` and trying to access the corresponding field in the debug info at (out of bounds) offset 12: >> https://github.com/openjdk/jdk/blob/e01d6d00bc4ab5ca0d38f8894a78e6d911e0fe93/src/hotspot/share/opto/output.cpp#L833-L836 >> >> Even worse, in some scenarios we don't crash/assert but emit incorrect debug info leading to wrong results after deoptimization. For example, `TestSafepointDebugInfo::test2` fails because the `SafePointScalarObject` for `box1` and `box2` point to the same field in the debug info. This can happen if scalar replacement happens again "on top of" an already inconsistent JVMState. Afterwards, the out of bounds offset accidentally points to the field of the newly scalarized object. >> >> Originally, this issue only reproduced intermittently with a long running internal stress test but I was able to extract a set of simple regression tests that trigger different failure modes in the compiler or wrong execution. >> >> I think fixing this is complicated and I therefore propose to disable [JDK-8261137](https://bugs.openjdk.java.net/browse/JDK-8261137) for now and file an enhancement to fix and re-enable it later. We should then also add proper verification code and more complete tests. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed unnecessary inline compile command I filed [JDK-8276998](https://bugs.openjdk.java.net/browse/JDK-8276998) to re-implement this. ------------- PR: https://git.openjdk.java.net/jdk/pull/6333 From thartmann at openjdk.java.net Thu Nov 11 13:23:35 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Nov 2021 13:23:35 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: On Tue, 2 Nov 2021 00:16:44 GMT, TatWai Chong wrote: >> After JDK-8269559 was integrated there are failures in tier1 testing >> across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. >> >> This patch is NOT functional; rather, this tends to verify potential >> toolchain issues as the original patch pass testing on other >> platforms. >> >> In this patch, we remove new SVE-related matching rules and register >> class introduced in the original patch to minimally affect the >> non-SVE part. > > TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: > > Add the matching rule in td file, enable control path in the code stub. Okay, it seems to be different failures from what has been reported in [JDK-8275263](https://bugs.openjdk.java.net/browse/JDK-8275263). Now, only the `gtest/GTestWrapper.java` fails in `AssemblerAArch64::validate_vm`. Either the failures reported by JDK-8275263 are spurious or they have been fixed by another change. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From simonis at openjdk.java.net Thu Nov 11 16:43:14 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 16:43:14 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v8] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Fix build issue for minimal/zero build one more time ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5488/files - new: https://git.openjdk.java.net/jdk/pull/5488/files/625da2f9..b3c130c8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From aph at openjdk.java.net Thu Nov 11 16:51:06 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 11 Nov 2021 16:51:06 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v3] In-Reply-To: References: Message-ID: > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Re-establish the FloatRegister::successor() hack. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6280/files - new: https://git.openjdk.java.net/jdk/pull/6280/files/66f11b87..958f4a25 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6280/head:pull/6280 PR: https://git.openjdk.java.net/jdk/pull/6280 From kvn at openjdk.java.net Thu Nov 11 17:11:37 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 11 Nov 2021 17:11:37 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v6] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 04:44:11 GMT, Dean Long wrote: >> Man Cao has updated the pull request incrementally with two additional commits since the last revision: >> >> - Use nullOpr() or {} instead of LIR_Opr() >> - Revert the renaming from LIR_OprDesc to LIR_Opr to minimize patch size > > Correction. The existing NULL value had me confused, thinking it would cause a crash if we ever tried to use it. > I see now that > > LIR_Opr() : _value(0) {} > > just preserves the existing behavior, and we will never get a crash because we never dereference the pointer. > But it does allow strange things like pointer() to return invalid memory, so again, I think it's best not to allow that. > Do you agree, @caoman and @vnkozlov? I am leaving final approval to @dean-long ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From kvn at openjdk.java.net Thu Nov 11 17:19:37 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 11 Nov 2021 17:19:37 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v8] In-Reply-To: References: Message-ID: <5Q-g54nyWmdaykYA01MSSN5yGS2qoAqnXPtxhyD12fU=.103ffc2e-dc83-4da4-876a-f05735feda1b@github.com> On Thu, 11 Nov 2021 16:43:14 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issue for minimal/zero build one more time I suggest to not rush it and wait JDK 19 because 18 is almost done. I wanted to look on this too but I am on vacation. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From rkennke at openjdk.java.net Thu Nov 11 17:30:35 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 11 Nov 2021 17:30:35 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v2] In-Reply-To: References: Message-ID: <2HDgnP2467MayPGZ36EhSOTSwLVzz15SljIhp3QHMFg=.50891284-4526-4225-9918-7576fcd6f143@github.com> On Wed, 10 Nov 2021 19:19:13 GMT, Roman Kennke wrote: >> The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. >> I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. >> >> The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [ ] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Verify monitors even in non-debug builds > /label remove hotspot /csr needed > > @rkennke A CSR request is needed here for the change to the UseHeavyMonitors flag, which for historical reasons was a product flag. I wonder if anyone using interpreter-only Zero builds uses this flag? I filed CSR here: https://bugs.openjdk.java.net/browse/JDK-8277025 ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From rkennke at openjdk.java.net Thu Nov 11 17:34:34 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 11 Nov 2021 17:34:34 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 07:24:24 GMT, David Holmes wrote: > How was, and going forward will, this be tested? There are no tests using UseHeavyMonitors. And a real test would be to run a bunch of other tests with the flag applied. I have run tier1-4 with the flag explicitely enabled. However, I agree, we should include the flag in a run config for some relevant tests. Do you have any suggestions? Maybe we have a stress test for synchronized already, and could add a run config there? jcstress comes to mind, but I am not sure if the included jcstress tests even exercise synchronized (rather than j.u.c. stuff)? ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From simonis at openjdk.java.net Thu Nov 11 17:35:47 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 17:35:47 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v8] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 16:43:14 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issue for minimal/zero build one more time Hi Vladimir, I'd be really happy if you could take a look at this PR. On the other hand, I did intend to bring this to JDK 18. There's still a month until RDP 1 starts and this PR has already been discussed for two month. If you say "don't hurry" does that mean that you won't have time to review it within the next month? Best regards and a pleasant vacation, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From kvn at openjdk.java.net Thu Nov 11 17:50:36 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 11 Nov 2021 17:50:36 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v8] In-Reply-To: References: Message-ID: <-qMqxY4R-0pRP7L0xxagLIrk-wW0XeLo7g13c3GQ8uk=.1d4c438a-e0dc-4a5d-a447-d4d2130bc9fc@github.com> On Thu, 11 Nov 2021 16:43:14 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issue for minimal/zero build one more time My vacation is just started and I will have just a week before RDP1 to do review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Nov 11 17:50:37 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 17:50:37 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v8] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 16:43:14 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issue for minimal/zero build one more time OK, enjoy your vacation then... ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From manc at openjdk.java.net Thu Nov 11 23:56:57 2021 From: manc at openjdk.java.net (Man Cao) Date: Thu, 11 Nov 2021 23:56:57 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v7] In-Reply-To: References: Message-ID: > Hi all, > > Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. > If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". Man Cao has updated the pull request incrementally with one additional commit since the last revision: Add an assertion to forbid pointer() returning NULL ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6221/files - new: https://git.openjdk.java.net/jdk/pull/6221/files/adaf6d4e..a082f36d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6221.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6221/head:pull/6221 PR: https://git.openjdk.java.net/jdk/pull/6221 From manc at openjdk.java.net Thu Nov 11 23:56:58 2021 From: manc at openjdk.java.net (Man Cao) Date: Thu, 11 Nov 2021 23:56:58 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v6] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 01:57:58 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Use nullOpr() or {} instead of LIR_Opr() > - Revert the renaming from LIR_OprDesc to LIR_Opr to minimize patch size I uploaded new result with `-XX:TieredStopAtLevel=3 -XX:CICompilerCount=1` to https://bugs.openjdk.java.net/browse/JDK-8276453. `java.ci.totalTime` is still unchanged. Also added `_value != 0` in assertion. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From ngasson at openjdk.java.net Fri Nov 12 04:06:35 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Fri, 12 Nov 2021 04:06:35 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: On Thu, 11 Nov 2021 13:20:55 GMT, Tobias Hartmann wrote: > Now, only the `gtest/GTestWrapper.java` fails in `AssemblerAArch64::validate_vm`. That is still quite strange. I just tried `gtest/GTestWrapper.java` on that commit on an M1 Mac and it passed (both fastdebug and release build). Could you post the log of the failing test? ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From duke at openjdk.java.net Fri Nov 12 04:21:34 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Fri, 12 Nov 2021 04:21:34 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: On Tue, 2 Nov 2021 00:16:44 GMT, TatWai Chong wrote: >> After JDK-8269559 was integrated there are failures in tier1 testing >> across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. >> >> This patch is NOT functional; rather, this tends to verify potential >> toolchain issues as the original patch pass testing on other >> platforms. >> >> In this patch, we remove new SVE-related matching rules and register >> class introduced in the original patch to minimally affect the >> non-SVE part. > > TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: > > Add the matching rule in td file, enable control path in the code stub. I haven't caught up. I'm confused that above mentioned all tests passed on the latest patch. Is gtest/GTestWrapper.java not included? ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From ngasson at openjdk.java.net Fri Nov 12 04:24:32 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Fri, 12 Nov 2021 04:24:32 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: <4ZahJ8S4B-c3hTrfxf1B-S_AxQircRCZPHeMD5YWbC0=.038cb5dd-2cb0-4786-b39a-46f994d54cc4@github.com> On Fri, 12 Nov 2021 04:18:26 GMT, TatWai Chong wrote: >> TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: >> >> Add the matching rule in td file, enable control path in the code stub. > > I haven't caught up. I'm confused that above mentioned all tests passed on the latest patch. Is gtest/GTestWrapper.java not included? @tatwaichong the `gtest/GTestWrapper.java` failure was on a re-test of your original commit, 8b1b6f9. This latest PR seems fine. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From dholmes at openjdk.java.net Fri Nov 12 05:11:34 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 12 Nov 2021 05:11:34 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v2] In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 19:19:13 GMT, Roman Kennke wrote: >> The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. >> I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. >> >> The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [ ] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Verify monitors even in non-debug builds I know nothing about jcstress but I skimmed through it and it seems synchronized is covered in "CHAPTER 2.c". I don't know how we normally run jcstress, but I don't think there is any file that would allow you set the flag. It would really be up to the person running the tests to add a configuration that includes testing UseHeavyMonitors. We have test configuration files that would do that, but I don't think we can add it to our testing due to resource constraints. So I guess we need at least some minimal tier1 test that does a bit of synchronization with UseHeavyMonitors enabled so that we can avoid breaking it. ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From duke at openjdk.java.net Fri Nov 12 06:33:31 2021 From: duke at openjdk.java.net (Fei Gao) Date: Fri, 12 Nov 2021 06:33:31 GMT Subject: RFR: 8275317: AArch64: Support some type conversion vectorization in SLP In-Reply-To: References: Message-ID: <7zvKixLj2bA_EtSsE8NMgjftELL5uHMyJ_HpNdOesFU=.fcb48013-a487-4bf1-ad37-c66d43f2fe27@github.com> On Thu, 28 Oct 2021 03:39:42 GMT, Fei Gao wrote: > Current SLP vectorizer in C2 compiler doesn't support type conversion > operations. But AArch64 has vector type conversion instructions in > both NEON and SVE. > > The type conversion involves two kinds of scenarios, conversion between > the same data sizes and conversion between different data sizes. If we > want to support casts between different data sizes, we need to amend > the code part for identifying adjacent memory references and the code > part for justifying if the combination is profitable. I suppose it > would be easier to review if we split the whole task to support type > conversion into two separate patches, one for the same data sizes and > the other one for different data sizes. The goal of this patch is just > to support conversions within the same data size, including: > int -> float > float -> int > long -> double > double -> long > > A typical test case: > > for (int i = start; i < limit; i++) { > b[i] = (float) a[i]; > } > > To implement it, the patch completed the necessary instructions and > matching rules in the backend and added implementation for SLP in > the middle end. > > The percentage of performance uplift on aarch64 system: > Mode: avgt > Cnt: 15 > Metric: (ns/op) > > benchmark percentage change [(After-Before)/Before] > VectorLoop.convertD2L -48.46% > VectorLoop.convertF2I -55.67% > VectorLoop.convertI2F -55.27% > VectorLoop.convertL2D -48.75% > A second review would be good. > > /reviewers 2 Thanks for your review, @TobiHartmann . Can I get a second review please? ------------- PR: https://git.openjdk.java.net/jdk/pull/6145 From thartmann at openjdk.java.net Fri Nov 12 07:03:35 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 12 Nov 2021 07:03:35 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: On Fri, 12 Nov 2021 04:03:24 GMT, Nick Gasson wrote: > Could you post the log of the failing test? The log file is not too helpful, it just contains: java.lang.AssertionError: gtest execution failed; exit code = 2. the failed tests: [AssemblerAArch64::validate_vm, ... AssemblerAArch64::validate_vm, AssemblerAArch64::validate_vm] at GTestWrapper.main(GTestWrapper.java:98) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:577) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) at java.base/java.lang.Thread.run(Thread.java:833) Where the `AssemblerAArch64::validate_vm` is repeated many times. > @tatwaichong the `gtest/GTestWrapper.java` failure was on a re-test of your original commit, [8b1b6f9](https://github.com/openjdk/jdk/commit/8b1b6f9fb375bbc2de339ad8f526ca4d5f83dc70). This latest PR seems fine. Yes, exactly. As this change seems stable, I'm fine with integrating it. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From dlong at openjdk.java.net Fri Nov 12 08:26:40 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 12 Nov 2021 08:26:40 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v7] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 23:56:57 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Add an assertion to forbid pointer() returning NULL Marked as reviewed by dlong (Reviewer). Let me test it before you integrate. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From dlong at openjdk.java.net Fri Nov 12 08:26:40 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 12 Nov 2021 08:26:40 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v6] In-Reply-To: References: Message-ID: <5f8BcGBKuvTOSrWgC5apnoQj47VaTmcsoRlwcoPUo5s=.f0a6f32c-dc97-46ce-8386-3064236e21dc@github.com> On Thu, 11 Nov 2021 23:53:48 GMT, Man Cao wrote: >> Man Cao has updated the pull request incrementally with two additional commits since the last revision: >> >> - Use nullOpr() or {} instead of LIR_Opr() >> - Revert the renaming from LIR_OprDesc to LIR_Opr to minimize patch size > > I uploaded new result with `-XX:TieredStopAtLevel=3 -XX:CICompilerCount=1` to https://bugs.openjdk.java.net/browse/JDK-8276453. `java.ci.totalTime` is still unchanged. > > Also added `_value != 0` in assertion. @caoman This looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From duke at openjdk.java.net Fri Nov 12 09:17:57 2021 From: duke at openjdk.java.net (Takuya Kiriyama) Date: Fri, 12 Nov 2021 09:17:57 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache Message-ID: Could you please review the 8277042 code? This is the enhancement for 8276036. I add a new test to verify the value of full_count in the message of insufficient codecache. ------------- Commit messages: - 8277042: add test for 8276036 to compiler/codecache Changes: https://git.openjdk.java.net/jdk/pull/6364/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6364&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277042 Stats: 125 lines in 1 file changed: 125 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6364.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6364/head:pull/6364 PR: https://git.openjdk.java.net/jdk/pull/6364 From neliasso at openjdk.java.net Fri Nov 12 10:11:40 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 12 Nov 2021 10:11:40 GMT Subject: Integrated: 8273277: C2: Move conditional negation into rc_predicate In-Reply-To: References: Message-ID: On Mon, 18 Oct 2021 13:24:19 GMT, Nils Eliasson wrote: > Hi, > > I need some feedback on this patch. This was reported from Tencent and found in internal testing about the same time. This patch is based on a a patch provided by Tencent. > > In some very specific circumstances we need to negate the range checks that we create in PhaseIdealLoop::loop_predication_impl_helper. This is done in three places, but that method also calls insert_initial_skeleton_predicate where this isn't taken into account. > > To simplify things I have moved the negation logic into rc_predicate. This should prevent us from missing this check again. > > I do have a concern that negating the condition of the rangecheck in the skeleton predicate, since the skeleton predicate will be used as a clone template, and some rangechecks optimizations seems to assume that range checks always have LT as the condidtion. On the other hand - it is a really uncommon scenario since we haven't failed here before. > > Feedback appreciated. > > Best regards, > Nils This pull request has now been integrated. Changeset: 710f4964 Author: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/710f496456d642c3e98d230270598f0b2dc75aba Stats: 96 lines in 5 files changed: 71 ins; 12 del; 13 mod 8273277: C2: Move conditional negation into rc_predicate Reviewed-by: thartmann, chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5987 From chagedorn at openjdk.java.net Fri Nov 12 13:59:36 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 12 Nov 2021 13:59:36 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 08:54:57 GMT, Takuya Kiriyama wrote: > Could you please review the 8277042 code? > This is the enhancement for 8276036. > I add a new test to verify the value of full_count in the message of insufficient codecache. Otherwise, looks good! Thanks for adding this test. test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java line 56: > 54: } > 55: > 56: public static void runtest() throws Throwable { Should be CamelCase: `runTest()`. test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java line 71: > 69: } > 70: } else { > 71: throw new RuntimeException("codecache shortage was not occured."); `was not occured` -> `did not occur` test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java line 97: > 95: abstract class Foo { > 96: public abstract int foo(); > 97: } Add some spaces between the classes to improve the readability. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6364 From jbhateja at openjdk.java.net Fri Nov 12 14:54:34 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 12 Nov 2021 14:54:34 GMT Subject: RFR: 8275317: AArch64: Support some type conversion vectorization in SLP In-Reply-To: <7zvKixLj2bA_EtSsE8NMgjftELL5uHMyJ_HpNdOesFU=.fcb48013-a487-4bf1-ad37-c66d43f2fe27@github.com> References: <7zvKixLj2bA_EtSsE8NMgjftELL5uHMyJ_HpNdOesFU=.fcb48013-a487-4bf1-ad37-c66d43f2fe27@github.com> Message-ID: On Fri, 12 Nov 2021 06:30:42 GMT, Fei Gao wrote: >> Current SLP vectorizer in C2 compiler doesn't support type conversion >> operations. But AArch64 has vector type conversion instructions in >> both NEON and SVE. >> >> The type conversion involves two kinds of scenarios, conversion between >> the same data sizes and conversion between different data sizes. If we >> want to support casts between different data sizes, we need to amend >> the code part for identifying adjacent memory references and the code >> part for justifying if the combination is profitable. I suppose it >> would be easier to review if we split the whole task to support type >> conversion into two separate patches, one for the same data sizes and >> the other one for different data sizes. The goal of this patch is just >> to support conversions within the same data size, including: >> int -> float >> float -> int >> long -> double >> double -> long >> >> A typical test case: >> >> for (int i = start; i < limit; i++) { >> b[i] = (float) a[i]; >> } >> >> To implement it, the patch completed the necessary instructions and >> matching rules in the backend and added implementation for SLP in >> the middle end. >> >> The percentage of performance uplift on aarch64 system: >> Mode: avgt >> Cnt: 15 >> Metric: (ns/op) >> >> benchmark percentage change [(After-Before)/Before] >> VectorLoop.convertD2L -48.46% >> VectorLoop.convertF2I -55.67% >> VectorLoop.convertI2F -55.27% >> VectorLoop.convertL2D -48.75% > >> A second review would be good. >> >> /reviewers 2 > > Thanks for your review, @TobiHartmann . Can I get a second review please? Hi @fg1417 , Thanks for fixing this, I have X86 backend changes for missing same sized vector cast operations. Will push for review after this PR gets integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6145 From dcubed at openjdk.java.net Fri Nov 12 17:08:36 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Nov 2021 17:08:36 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v2] In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 19:19:13 GMT, Roman Kennke wrote: >> The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. >> I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. >> >> The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [ ] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Verify monitors even in non-debug builds @lmesnik - Can you chime in the testing of this cmd line option? ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From kvn at openjdk.java.net Fri Nov 12 17:33:40 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 12 Nov 2021 17:33:40 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v3] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 05:12:09 GMT, Mai ??ng Qu?n Anh wrote: >> This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. >> >> In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request incrementally with two additional commits since the last revision: > > - replace cmpx->Opcode() with cmpx_op > - address reviews, remove checks for subtraction operatios Thank you for addressing my comments. I am tentatively approve these changes leaving final approval and testing to @TobiHartmann ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6101 From kvn at openjdk.java.net Fri Nov 12 17:35:40 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 12 Nov 2021 17:35:40 GMT Subject: RFR: 8274982: Add a test for 8269574. In-Reply-To: References: Message-ID: On Mon, 11 Oct 2021 09:55:28 GMT, Evgeny Nikitin wrote: > This PR contains a relatively simple test which verifies that JVMTI-agents are correctly informed about exceptions caught in C2-compiled code. The 8269574 introduces pre-allocated exceptions in some paths, so the test tries to produce a number of various exceptions and check that provided small JVMTI agent got notified about all of them. I will leave approval to @sspitsyn and @dholmes-ora ------------- PR: https://git.openjdk.java.net/jdk/pull/5889 From lmesnik at openjdk.java.net Fri Nov 12 19:27:34 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Fri, 12 Nov 2021 19:27:34 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v2] In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 19:19:13 GMT, Roman Kennke wrote: >> The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. >> I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. >> >> The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [ ] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Verify monitors even in non-debug builds The testing VM options are set in task definitions. I'll file a separate issue to adjust configs. ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From manc at openjdk.java.net Fri Nov 12 20:06:25 2021 From: manc at openjdk.java.net (Man Cao) Date: Fri, 12 Nov 2021 20:06:25 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v8] In-Reply-To: References: Message-ID: <6aE-rcMdvdCls1oS3wwmlX-qE9Oz6-jMkW1ggnFvkOs=.0008b112-af5a-4f38-9562-6b6c1fe5b5d3@github.com> > Hi all, > > Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. > If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". Man Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Merge branch 'master' into JDK8276453 - Add an assertion to forbid pointer() returning NULL - Use nullOpr() or {} instead of LIR_Opr() - Revert the renaming from LIR_OprDesc to LIR_Opr to minimize patch size - Fix aarch64 and arm builds - Fix build errors on non-x86 or non-Linux environments - Remove constructor that takes int to fix build error - Fix errors related NULL value without --disable-warnings-as-errors - Add _value field and rename LIR_OprDesc to LIR_Opr ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6221/files - new: https://git.openjdk.java.net/jdk/pull/6221/files/a082f36d..eb6998b9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=06-07 Stats: 410615 lines in 833 files changed: 203892 ins; 197453 del; 9270 mod Patch: https://git.openjdk.java.net/jdk/pull/6221.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6221/head:pull/6221 PR: https://git.openjdk.java.net/jdk/pull/6221 From manc at openjdk.java.net Fri Nov 12 20:06:29 2021 From: manc at openjdk.java.net (Man Cao) Date: Fri, 12 Nov 2021 20:06:29 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v7] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 23:56:57 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request incrementally with one additional commit since the last revision: > > Add an assertion to forbid pointer() returning NULL Thanks. There was a timeout failure for runtime/LoadClass/TestResize.java on macOS x64. I did a merge and let it rerun the pre-submit tests. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From dlong at openjdk.java.net Fri Nov 12 20:49:48 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 12 Nov 2021 20:49:48 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v8] In-Reply-To: <6aE-rcMdvdCls1oS3wwmlX-qE9Oz6-jMkW1ggnFvkOs=.0008b112-af5a-4f38-9562-6b6c1fe5b5d3@github.com> References: <6aE-rcMdvdCls1oS3wwmlX-qE9Oz6-jMkW1ggnFvkOs=.0008b112-af5a-4f38-9562-6b6c1fe5b5d3@github.com> Message-ID: On Fri, 12 Nov 2021 20:06:25 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Merge branch 'master' into JDK8276453 > - Add an assertion to forbid pointer() returning NULL > - Use nullOpr() or {} instead of LIR_Opr() > - Revert the renaming from LIR_OprDesc to LIR_Opr to minimize patch size > - Fix aarch64 and arm builds > - Fix build errors on non-x86 or non-Linux environments > - Remove constructor that takes int to fix build error > - Fix errors related NULL value without --disable-warnings-as-errors > - Add _value field and rename LIR_OprDesc to LIR_Opr I'm also seeing timeouts in my testing, especially with -Xcomp. Did you do any performance testing with -Xcomp? ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From rkennke at openjdk.java.net Fri Nov 12 21:41:33 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 12 Nov 2021 21:41:33 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v2] In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 19:24:37 GMT, Leonid Mesnik wrote: > The testing VM options are set in task definitions. I'll file a separate issue to adjust configs. This is not necessary. I only need one or a few tests that exercise synchronize statement, and run this with -XX:+UseHeavyMonitors. If no such test exists, then I'll make one. ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From manc at openjdk.java.net Fri Nov 12 22:09:45 2021 From: manc at openjdk.java.net (Man Cao) Date: Fri, 12 Nov 2021 22:09:45 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v8] In-Reply-To: <6aE-rcMdvdCls1oS3wwmlX-qE9Oz6-jMkW1ggnFvkOs=.0008b112-af5a-4f38-9562-6b6c1fe5b5d3@github.com> References: <6aE-rcMdvdCls1oS3wwmlX-qE9Oz6-jMkW1ggnFvkOs=.0008b112-af5a-4f38-9562-6b6c1fe5b5d3@github.com> Message-ID: On Fri, 12 Nov 2021 20:06:25 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Merge branch 'master' into JDK8276453 > - Add an assertion to forbid pointer() returning NULL > - Use nullOpr() or {} instead of LIR_Opr() > - Revert the renaming from LIR_OprDesc to LIR_Opr to minimize patch size > - Fix aarch64 and arm builds > - Fix build errors on non-x86 or non-Linux environments > - Remove constructor that takes int to fix build error > - Fix errors related NULL value without --disable-warnings-as-errors > - Add _value field and rename LIR_OprDesc to LIR_Opr I didn't run performance testing with -Xcomp. I'm running it now. Could you share more details on which test timed out? Are they reproducible on Linux x64? ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From dlong at openjdk.java.net Fri Nov 12 22:23:46 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 12 Nov 2021 22:23:46 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v8] In-Reply-To: <6aE-rcMdvdCls1oS3wwmlX-qE9Oz6-jMkW1ggnFvkOs=.0008b112-af5a-4f38-9562-6b6c1fe5b5d3@github.com> References: <6aE-rcMdvdCls1oS3wwmlX-qE9Oz6-jMkW1ggnFvkOs=.0008b112-af5a-4f38-9562-6b6c1fe5b5d3@github.com> Message-ID: On Fri, 12 Nov 2021 20:06:25 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Merge branch 'master' into JDK8276453 > - Add an assertion to forbid pointer() returning NULL > - Use nullOpr() or {} instead of LIR_Opr() > - Revert the renaming from LIR_OprDesc to LIR_Opr to minimize patch size > - Fix aarch64 and arm builds > - Fix build errors on non-x86 or non-Linux environments > - Remove constructor that takes int to fix build error > - Fix errors related NULL value without --disable-warnings-as-errors > - Add _value field and rename LIR_OprDesc to LIR_Opr I'm not able to notice any significant difference locally with your change, so I suspect the timeouts are unrelated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From lmesnik at openjdk.java.net Fri Nov 12 22:28:38 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Fri, 12 Nov 2021 22:28:38 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v2] In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 21:38:24 GMT, Roman Kennke wrote: >> The testing VM options are set in task definitions. I'll file a separate issue to adjust configs. > >> The testing VM options are set in task definitions. I'll file a separate issue to adjust configs. > > This is not necessary. I only need one or a few tests that exercise synchronize statement, and run this with -XX:+UseHeavyMonitors. If no such test exists, then I'll make one. @rkennke thanks, I don't see any good specific tests to run openjdk regression suite. There are a bunch of tests in JCK, jcstress. You could also just run tier1 with UseHeavyMonitors as a sanity check. There are also jvmti tests that might be used to verify that JVMTI is not broken with UseHeavyMonitors. ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From manc at openjdk.java.net Fri Nov 12 22:28:49 2021 From: manc at openjdk.java.net (Man Cao) Date: Fri, 12 Nov 2021 22:28:49 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build [v8] In-Reply-To: <6aE-rcMdvdCls1oS3wwmlX-qE9Oz6-jMkW1ggnFvkOs=.0008b112-af5a-4f38-9562-6b6c1fe5b5d3@github.com> References: <6aE-rcMdvdCls1oS3wwmlX-qE9Oz6-jMkW1ggnFvkOs=.0008b112-af5a-4f38-9562-6b6c1fe5b5d3@github.com> Message-ID: <1WF-6T6G7PzBsPqF0ybc2vqLOX80rsHNAIVLGnnTzz0=.27e65ad4-38b6-4c42-b739-16ce3b7b45b9@github.com> On Fri, 12 Nov 2021 20:06:25 GMT, Man Cao wrote: >> Hi all, >> >> Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. >> If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". > > Man Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Merge branch 'master' into JDK8276453 > - Add an assertion to forbid pointer() returning NULL > - Use nullOpr() or {} instead of LIR_Opr() > - Revert the renaming from LIR_OprDesc to LIR_Opr to minimize patch size > - Fix aarch64 and arm builds > - Fix build errors on non-x86 or non-Linux environments > - Remove constructor that takes int to fix build error > - Fix errors related NULL value without --disable-warnings-as-errors > - Add _value field and rename LIR_OprDesc to LIR_Opr Thanks. I'll integrate now. ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From manc at openjdk.java.net Fri Nov 12 22:37:40 2021 From: manc at openjdk.java.net (Man Cao) Date: Fri, 12 Nov 2021 22:37:40 GMT Subject: Integrated: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 00:59:36 GMT, Man Cao wrote: > Hi all, > > Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. > If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". This pull request has now been integrated. Changeset: 8c5f0304 Author: Man Cao URL: https://git.openjdk.java.net/jdk/commit/8c5f03049196e66a4f8411bdd853b287134e7ce5 Stats: 96 lines in 30 files changed: 35 ins; 16 del; 45 mod 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build Co-authored-by: Chuck Rasbold Co-authored-by: James Y Knight Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/6221 From manc at openjdk.java.net Fri Nov 12 23:34:50 2021 From: manc at openjdk.java.net (Man Cao) Date: Fri, 12 Nov 2021 23:34:50 GMT Subject: RFR: 8276976: Rename LIR_OprDesc to LIR_Opr Message-ID: Hi all, Can I have reviews for this mechanical renaming change as a follow up to https://bugs.openjdk.java.net/browse/JDK-8276453? ------------- Commit messages: - Rename LIR_OprDesc to LIR_Opr Changes: https://git.openjdk.java.net/jdk/pull/6377/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6377&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276976 Stats: 238 lines in 14 files changed: 0 ins; 9 del; 229 mod Patch: https://git.openjdk.java.net/jdk/pull/6377.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6377/head:pull/6377 PR: https://git.openjdk.java.net/jdk/pull/6377 From simonis at openjdk.java.net Sat Nov 13 00:32:13 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Sat, 13 Nov 2021 00:32:13 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v9] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Fix IR Framework test Traps::classCheck() which now behaves differently with -XX:+OptimizeImplicitExceptions ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5488/files - new: https://git.openjdk.java.net/jdk/pull/5488/files/b3c130c8..536f5398 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=07-08 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From duke at openjdk.java.net Sat Nov 13 05:22:07 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Sat, 13 Nov 2021 05:22:07 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v4] In-Reply-To: References: Message-ID: > This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. > > In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. > > Thank you very much. Mai ??ng Qu?n Anh has updated the pull request incrementally with two additional commits since the last revision: - add tests cover constant comparison and calling library - add eq/ne, add correction test, refine micro ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6101/files - new: https://git.openjdk.java.net/jdk/pull/6101/files/4dada5fc..92e92cfe Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6101&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6101&range=02-03 Stats: 452 lines in 3 files changed: 427 ins; 2 del; 23 mod Patch: https://git.openjdk.java.net/jdk/pull/6101.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6101/head:pull/6101 PR: https://git.openjdk.java.net/jdk/pull/6101 From duke at openjdk.java.net Sat Nov 13 05:22:13 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Sat, 13 Nov 2021 05:22:13 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v3] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 07:41:28 GMT, Tobias Hartmann wrote: >> Mai ??ng Qu?n Anh has updated the pull request incrementally with two additional commits since the last revision: >> >> - replace cmpx->Opcode() with cmpx_op >> - address reviews, remove checks for subtraction operatios > > src/hotspot/share/opto/subnode.cpp line 1534: > >> 1532: } >> 1533: >> 1534: // Change x + Integer.MIN_VALUE <=> y + Integer.MIN_VALUE into x u<=> y > > The `<=>` in the comment is confusing because it usually denotes logical equality. Also, you are only handling `<` and `>` below. What about the other variants? Shouldn't they be canonicalized in `idealize_test` (see `ifnode.cpp`)? > > I would recommend making it explicit in the comment and use brackets for readability: > > // Change (x + Integer.MIN_VALUE < y + Integer.MIN_VALUE) into (x u< y) and > // (x + Integer.MIN_VALUE > y + Integer.MIN_VALUE) into (x u> y). I have added `eq` and `ne` to the transformation, leaving the comment simply as `cmp (add X min_jint) (add Y min_jint)`. > src/hotspot/share/opto/subnode.cpp line 1546: > >> 1544: } else if (cmp2_op == Op_AddI && >> 1545: phase->type(cmp2->in(2)) == TypeInt::MIN) { >> 1546: Node *ncmp = phase->transform(new CmpUNode(cmp1->in(1), cmp2->in(1))); > > `Node *ncmp` -> `Node* ncmp` Done, sir. ------------- PR: https://git.openjdk.java.net/jdk/pull/6101 From duke at openjdk.java.net Sat Nov 13 05:25:33 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Sat, 13 Nov 2021 05:25:33 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v3] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 08:33:25 GMT, Tobias Hartmann wrote: >> Mai ??ng Qu?n Anh has updated the pull request incrementally with two additional commits since the last revision: >> >> - replace cmpx->Opcode() with cmpx_op >> - address reviews, remove checks for subtraction operatios > > Given that there is `Integer/Long.compareUnsigned` using this idiom, it seems reasonable to optimize. Some general comments: > - Your benchmark does not cover all the cases you are optimizing. Maybe you should also add the `Integer.compareUnsigned` variants. > - You need a correctness test as well, ideally using the IR verification framework to also verify that the optimizations are actually performed. @TobiHartmann Thank you very much for the review, I have added the correctness test as well as revised the microbenchmark to cover all the situations including calling to `[Integer/Long].compareUnsigned`. The result of the benchmark is as follow: Before: Benchmark Mode Cnt Score Error Units UnsignedComparison.intConDirect avgt 10 927.540 ? 19.142 ns/op UnsignedComparison.intConLibGT avgt 10 916.753 ? 4.502 ns/op UnsignedComparison.intConLibLT avgt 10 927.911 ? 15.027 ns/op UnsignedComparison.intVarDirect avgt 10 1005.895 ? 14.030 ns/op UnsignedComparison.intVarLibGT avgt 10 999.216 ? 1.528 ns/op UnsignedComparison.intVarLibLT avgt 10 1000.501 ? 4.117 ns/op UnsignedComparison.longConDirect avgt 10 1082.950 ? 8.166 ns/op UnsignedComparison.longConLibGT avgt 10 1081.340 ? 6.883 ns/op UnsignedComparison.longConLibLT avgt 10 1079.599 ? 4.229 ns/op UnsignedComparison.longVarDirect avgt 10 1131.605 ? 76.268 ns/op UnsignedComparison.longVarLibGT avgt 10 1180.006 ? 7.018 ns/op UnsignedComparison.longVarLibLT avgt 10 1178.463 ? 0.809 ns/op After: Benchmark Mode Cnt Score Error Units UnsignedComparison.intConDirect avgt 10 740.951 ? 5.020 ns/op UnsignedComparison.intConLibGT avgt 10 808.425 ? 2.989 ns/op UnsignedComparison.intConLibLT avgt 10 740.029 ? 1.332 ns/op UnsignedComparison.intVarDirect avgt 10 911.489 ? 4.700 ns/op UnsignedComparison.intVarLibGT avgt 10 979.338 ? 8.130 ns/op UnsignedComparison.intVarLibLT avgt 10 910.429 ? 3.452 ns/op UnsignedComparison.longConDirect avgt 10 750.174 ? 5.915 ns/op UnsignedComparison.longConLibGT avgt 10 828.144 ? 53.091 ns/op UnsignedComparison.longConLibLT avgt 10 771.493 ? 52.902 ns/op UnsignedComparison.longVarDirect avgt 10 882.139 ? 2.091 ns/op UnsignedComparison.longVarLibGT avgt 10 952.862 ? 10.248 ns/op UnsignedComparison.longVarLibLT avgt 10 881.623 ? 1.895 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/6101 From stuefe at openjdk.java.net Sat Nov 13 07:54:36 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 13 Nov 2021 07:54:36 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v3] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 16:51:06 GMT, Andrew Haley wrote: >> The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. >> The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, >> >> >> typedef RegisterImpl *Register; >> const Register r10 = ((Register)10); >> >> >> Registers have accessors, e.g.: >> >> ` int RegisterImpl::encoding() const { return (intptr_t)this; }` >> >> This works by an accident of implementation: it is not legal C++. >> >> The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) >> >> >> extern RegisterImpl all_Registers[num_Registers]; >> int RegisterImpl::encoding() const { return this - all_Registers; } >> >> >> After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. >> >> An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: >> >> ` int RegisterImpl::encoding() const { return _encoding; }` >> >> This would result in smaller code, but I suspect slower. >> >> If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Re-establish the FloatRegister::successor() hack. Hi @theRealAph, I had a look at the changes. Note that I don't know arm very well, but reading the code has been very interesting. Cheers, Thomas src/hotspot/cpu/aarch64/register_aarch64.cpp line 56: > 54: > 55: const char* FloatRegisterImpl::name() const { > 56: static const char *const names[number_of_registers] = { While reading this code I noticed that this method is sensitive to changes to is_valid, therefore care has to be taken when changing its semantics. Or, maybe just extend the array to number_of_declared_registers and add strings for ZR and SP, just to be safe. Update: my comment is in the wrong place, I meant to comment RegisterImpl::name(). src/hotspot/cpu/aarch64/register_aarch64.hpp line 44: > 42: inline friend const Register as_Register(int encoding); > 43: > 44: private: nit: private not needed src/hotspot/cpu/aarch64/register_aarch64.hpp line 63: > 61: // accessors > 62: int encoding() const { assert(is_valid(), "invalid register"); return encoding_nocheck(); } > 63: bool is_valid() const { return this >= first() && this < invalid(); } is_valid() now returns true for ZR, SP. Was this intended? This affects other functions too, e.g. `RegisterImpl::name()`. src/hotspot/cpu/aarch64/register_aarch64.hpp line 64: > 62: int encoding() const { assert(is_valid(), "invalid register"); return encoding_nocheck(); } > 63: bool is_valid() const { return this >= first() && this < invalid(); } > 64: bool has_byte_register() const { return is_valid(); } Same here, semantics changed to include 32 and 33 src/hotspot/cpu/aarch64/register_aarch64.hpp line 255: > 253: enum { > 254: number_of_registers = 16, > 255: number_of_declared_registers = 16, Is there a semantic difference between number of registers and declared registers for float and P registers? Otherwise, having two constants for the same thing makes the code less clear and maybe more error prone. src/hotspot/share/asm/register.hpp line 62: > 60: #else // USE_POINTERS_TO_REGISTER_IMPL_ARRAY > 61: > 62: #define REGISTER_IMPL_DECLARATION(type, impl_type) \ nit: align backslashes? src/hotspot/share/asm/register.hpp line 64: > 62: #define REGISTER_IMPL_DECLARATION(type, impl_type) \ > 63: inline const type as_ ## type(int encoding) { \ > 64: assert(encoding <= impl_type::number_of_declared_registers, "invalid register"); \ assert for >= -1 too? src/hotspot/share/asm/register.hpp line 71: > 69: > 70: #define REGISTER_IMPL_DEFINITION(type, impl_type) \ > 71: impl_type all_ ## type ## s[impl_type::number_of_declared_registers]; Would this not need attribute visibility too? src/hotspot/share/asm/register.hpp line 87: > 85: // OS-specific way. > 86: #ifdef __GNUC__ > 87: #define INTERNAL_VISIBILITY __attribute__ ((visibility ("internal"))) I try to understand this, is this to allow other object files to see these symbols while preventing them from being exported from the libjvm? ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Sat Nov 13 09:46:36 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 13 Nov 2021 09:46:36 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v3] In-Reply-To: References: Message-ID: On Sat, 13 Nov 2021 07:43:12 GMT, Thomas Stuefe wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Re-establish the FloatRegister::successor() hack. > > src/hotspot/cpu/aarch64/register_aarch64.hpp line 255: > >> 253: enum { >> 254: number_of_registers = 16, >> 255: number_of_declared_registers = 16, > > Is there a semantic difference between number of registers and declared registers for float and P registers? Otherwise, having two constants for the same thing makes the code less clear and maybe more error prone. Perhaps, but I'm trying to do something really general. There is _sometimes_ a difference between the number of physical registers and the number of named registers, and that's what I'm trying to capture here. > src/hotspot/share/asm/register.hpp line 62: > >> 60: #else // USE_POINTERS_TO_REGISTER_IMPL_ARRAY >> 61: >> 62: #define REGISTER_IMPL_DECLARATION(type, impl_type) \ > > nit: align backslashes? You've got a point. I'll have another look. > src/hotspot/share/asm/register.hpp line 64: > >> 62: #define REGISTER_IMPL_DECLARATION(type, impl_type) \ >> 63: inline const type as_ ## type(int encoding) { \ >> 64: assert(encoding <= impl_type::number_of_declared_registers, "invalid register"); \ > > assert for >= -1 too? OK. > src/hotspot/share/asm/register.hpp line 87: > >> 85: // OS-specific way. >> 86: #ifdef __GNUC__ >> 87: #define INTERNAL_VISIBILITY __attribute__ ((visibility ("internal"))) > > I try to understand this, is this to allow other object files to see these symbols while preventing them from being exported from the libjvm? Preventing them from being _redefined_ by other shared libraries. Without this, every time you load the address of `all_Registers` you load from the Global Offset Table, rather than using an immediate operand. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Sat Nov 13 09:54:35 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 13 Nov 2021 09:54:35 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v3] In-Reply-To: References: Message-ID: On Sat, 13 Nov 2021 07:47:18 GMT, Thomas Stuefe wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Re-establish the FloatRegister::successor() hack. > > src/hotspot/cpu/aarch64/register_aarch64.cpp line 56: > >> 54: >> 55: const char* FloatRegisterImpl::name() const { >> 56: static const char *const names[number_of_registers] = { > > While reading this code I noticed that this method is sensitive to changes to is_valid, therefore care has to be taken when changing its semantics. Or, maybe just extend the array to number_of_declared_registers and add strings for ZR and SP, just to be safe. > > Update: my comment is in the wrong place, I meant to comment RegisterImpl::name(). Great catch! I'll have a look at how this function is used. It's a little bit awkward that there is not a simple 1-to-1 mapping between register numbers and register names. > src/hotspot/cpu/aarch64/register_aarch64.hpp line 44: > >> 42: inline friend const Register as_Register(int encoding); >> 43: >> 44: private: > > nit: private not needed OK. > src/hotspot/cpu/aarch64/register_aarch64.hpp line 63: > >> 61: // accessors >> 62: int encoding() const { assert(is_valid(), "invalid register"); return encoding_nocheck(); } >> 63: bool is_valid() const { return this >= first() && this < invalid(); } > > is_valid() now returns true for ZR, SP. Was this intended? This affects other functions too, e.g. `RegisterImpl::name()`. I think so, yes. SP and ZR are valid registers. `is_valid()` should only return `false` for `noreg`. > src/hotspot/cpu/aarch64/register_aarch64.hpp line 64: > >> 62: int encoding() const { assert(is_valid(), "invalid register"); return encoding_nocheck(); } >> 63: bool is_valid() const { return this >= first() && this < invalid(); } >> 64: bool has_byte_register() const { return is_valid(); } > > Same here, semantics changed to include 32 and 33 I guess it would be "safer" to go back to excluding SP and ZR, but it was never intentional to exclude them. I'll go through all the uses of is_valid() to see which is best. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From stuefe at openjdk.java.net Sat Nov 13 10:12:37 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 13 Nov 2021 10:12:37 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v3] In-Reply-To: References: Message-ID: On Sat, 13 Nov 2021 09:40:51 GMT, Andrew Haley wrote: >> src/hotspot/share/asm/register.hpp line 87: >> >>> 85: // OS-specific way. >>> 86: #ifdef __GNUC__ >>> 87: #define INTERNAL_VISIBILITY __attribute__ ((visibility ("internal"))) >> >> I try to understand this, is this to allow other object files to see these symbols while preventing them from being exported from the libjvm? > > Preventing them from being _redefined_ by other shared libraries. Without this, every time you load the address of `all_Registers` you load from the Global Offset Table, rather than using an immediate operand. Ah, thanks for the explanation. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From stuefe at openjdk.java.net Sat Nov 13 10:15:34 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 13 Nov 2021 10:15:34 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v3] In-Reply-To: References: Message-ID: <53rT9tbTqrs-GQyk56a3guueNHRVQC03hBje-eDB6JY=.8aef4f99-e1dd-4836-a5f2-e59f86ded3ee@github.com> On Sat, 13 Nov 2021 09:50:04 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/register_aarch64.cpp line 56: >> >>> 54: >>> 55: const char* FloatRegisterImpl::name() const { >>> 56: static const char *const names[number_of_registers] = { >> >> While reading this code I noticed that this method is sensitive to changes to is_valid, therefore care has to be taken when changing its semantics. Or, maybe just extend the array to number_of_declared_registers and add strings for ZR and SP, just to be safe. >> >> Update: my comment is in the wrong place, I meant to comment RegisterImpl::name(). > > Great catch! I'll have a look at how this function is used. It's a little bit awkward that there is not a simple 1-to-1 mapping between register numbers and register names. I understood the function as "return the functional name of the register, if there is one, otherwise return "r number". ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From ysuenaga at openjdk.java.net Sat Nov 13 12:40:56 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Sat, 13 Nov 2021 12:40:56 GMT Subject: RFR: 8277089: Use system binutils to build hsdis Message-ID: hsdis requires binutils source tree for building. Most of Linux distros provide binutils package. (e.g. binutils-devel from Fedora, binutils-dev from Ubuntu) It would be nice to be able to use them like zlib and lcms. Unfortunately bfdver.h would not be provided because it is not included install files (`make install`) in binutils. So I changed to use `SEC_ELF_OCTETS` macro to detect binutils version because it was introduced at the same time as `bfd_octets_per_byte()`. https://sourceware.org/git/?p=binutils-gdb.git;a=commit;f=bfd/bfd-in2.h;h=618265039f697eab9e72bb58b95fc2d32925df58 Please see [JDK-8244819](https://bugs.openjdk.java.net/browse/JDK-8244819) why we need version check. ------------- Commit messages: - 8277089: Use system binutils to build hsdis Changes: https://git.openjdk.java.net/jdk/pull/6378/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6378&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277089 Stats: 24 lines in 3 files changed: 19 ins; 1 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6378.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6378/head:pull/6378 PR: https://git.openjdk.java.net/jdk/pull/6378 From aph at openjdk.java.net Sat Nov 13 17:16:33 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 13 Nov 2021 17:16:33 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v3] In-Reply-To: <53rT9tbTqrs-GQyk56a3guueNHRVQC03hBje-eDB6JY=.8aef4f99-e1dd-4836-a5f2-e59f86ded3ee@github.com> References: <53rT9tbTqrs-GQyk56a3guueNHRVQC03hBje-eDB6JY=.8aef4f99-e1dd-4836-a5f2-e59f86ded3ee@github.com> Message-ID: On Sat, 13 Nov 2021 10:12:43 GMT, Thomas Stuefe wrote: >> Great catch! I'll have a look at how this function is used. It's a little bit awkward that there is not a simple 1-to-1 mapping between register numbers and register names. > > I understood the function as "return the functional name of the register, if there is one, otherwise return "r number". Yes, but the problem is that Register 31 has more than one name. I guess "r31" is the best we can do. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From liusy58 at smail.nju.edu.cn Sun Nov 14 03:08:36 2021 From: liusy58 at smail.nju.edu.cn (=?UTF-8?B?5YiY5pav5a6H?=) Date: Sun, 14 Nov 2021 11:08:36 +0800 Subject: About static call code emit Message-ID: +82C8B573BAF575D2 I have noticed that during the code_emit phase, a static call will create a static stub, and doing some relocation here, you can find the logic in the function emit_call from file `src/hotspot/share/c1/c1_LIRAssembler.cpp`, I'm at a loss about why a stub should be created? any help? From erik.osterlund at oracle.com Sun Nov 14 09:21:58 2021 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Sun, 14 Nov 2021 09:21:58 +0000 Subject: About static call code emit In-Reply-To: +82C8B573BAF575D2 References: +82C8B573BAF575D2 Message-ID: The static call can go either to the verified entry of an nmethod, or to the c2i adapter to convert the execution to interpreted. In the latter case, we need to set the target Method* in a particular register (rbx on x86_64) so the interpreter knows what to run. But you don?t need to do that for compiled calls, as the destination implies that. The static call stubs are literally just a way to have the static call sites only conditionally set the method register when it?s needed, to save a register set operation in the fast path. You might wonder if said optimization is really worthwhile. In my experiments, it is not. Therefore my new invoke bindings removes that. It?s a lot of complicated code, and no noticeable advantage, compared to just always setting the register in case it is needed. /Erik > On 14 Nov 2021, at 05:08, ??? wrote: > > ?I have noticed that during the code_emit phase, a static call will create a static stub, and doing some relocation here, you can find the logic in the function emit_call from file `src/hotspot/share/c1/c1_LIRAssembler.cpp`, I'm at a loss about why a stub should be created? any help? > > > From duke at openjdk.java.net Mon Nov 15 09:51:18 2021 From: duke at openjdk.java.net (Takuya Kiriyama) Date: Mon, 15 Nov 2021 09:51:18 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v2] In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 13:56:31 GMT, Christian Hagedorn wrote: >> Takuya Kiriyama has updated the pull request incrementally with one additional commit since the last revision: >> >> 8277042: add test for 8276036 to compiler/codecache > > Otherwise, looks good! Thanks for adding this test. Hi, @chhagedorn . Thank you for your comment. I fixed this test as per your instructions. ------------- PR: https://git.openjdk.java.net/jdk/pull/6364 From duke at openjdk.java.net Mon Nov 15 09:51:16 2021 From: duke at openjdk.java.net (Takuya Kiriyama) Date: Mon, 15 Nov 2021 09:51:16 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v2] In-Reply-To: References: Message-ID: > Could you please review the 8277042 code? > This is the enhancement for 8276036. > I add a new test to verify the value of full_count in the message of insufficient codecache. Takuya Kiriyama has updated the pull request incrementally with one additional commit since the last revision: 8277042: add test for 8276036 to compiler/codecache ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6364/files - new: https://git.openjdk.java.net/jdk/pull/6364/files/4d29d81a..07aa7c86 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6364&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6364&range=00-01 Stats: 7 lines in 1 file changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6364.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6364/head:pull/6364 PR: https://git.openjdk.java.net/jdk/pull/6364 From ihse at openjdk.java.net Mon Nov 15 10:26:36 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 15 Nov 2021 10:26:36 GMT Subject: RFR: 8277089: Use system binutils to build hsdis In-Reply-To: References: Message-ID: On Sat, 13 Nov 2021 08:08:53 GMT, Yasumasa Suenaga wrote: > hsdis requires binutils source tree for building. Most of Linux distros provide binutils package. (e.g. binutils-devel from Fedora, binutils-dev from Ubuntu) > It would be nice to be able to use them like zlib and lcms. > > Unfortunately bfdver.h would not be provided because it is not included install files (`make install`) in binutils. So I changed to use `SEC_ELF_OCTETS` macro to detect binutils version because it was introduced at the same time as `bfd_octets_per_byte()`. > > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;f=bfd/bfd-in2.h;h=618265039f697eab9e72bb58b95fc2d32925df58 > > Please see [JDK-8244819](https://bugs.openjdk.java.net/browse/JDK-8244819) why we need version check. The basic idea is fine. I also think checking for `SEC_ELF_OCTETS` in the source code, instead of the version number, is actually an improvement. The one thing that itches me a bit is what happens when you specify `--with-binutils=system` and a dependent library is not found: AC_CHECK_LIB(iberty, xmalloc, [ HSDIS_LIBS="$HSDIS_LIBS -liberty" ], [ AC_MSG_ERROR([libiberty not found]) ]) Then the build will fail with no clear indication on why. Instead, I'd recommend that you restructure slightly. First check if with_binutils is system. If so, run your lib checks but like this: AC_CHECK_LIB(iberty, xmalloc, [ HSDIS_LIBS="$HSDIS_LIBS -liberty" ], [ bintils_system_error="libiberty not found" ]) Then you go check the value of with_binutils again in the "switch" statement. And you can replace `AC_MSG_CHECKING` outside the switch statement again. If it is system, you check if `bintils_system_error` is non-empty. If so, you fail and explain that this-and-this error prevented system from working. What distributions have you tested this on? ------------- PR: https://git.openjdk.java.net/jdk/pull/6378 From ysuenaga at openjdk.java.net Mon Nov 15 13:19:56 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Mon, 15 Nov 2021 13:19:56 GMT Subject: RFR: 8277089: Use system binutils to build hsdis [v2] In-Reply-To: References: Message-ID: > hsdis requires binutils source tree for building. Most of Linux distros provide binutils package. (e.g. binutils-devel from Fedora, binutils-dev from Ubuntu) > It would be nice to be able to use them like zlib and lcms. > > Unfortunately bfdver.h would not be provided because it is not included install files (`make install`) in binutils. So I changed to use `SEC_ELF_OCTETS` macro to detect binutils version because it was introduced at the same time as `bfd_octets_per_byte()`. > > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;f=bfd/bfd-in2.h;h=618265039f697eab9e72bb58b95fc2d32925df58 > > Please see [JDK-8244819](https://bugs.openjdk.java.net/browse/JDK-8244819) why we need version check. Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: Refactoring ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6378/files - new: https://git.openjdk.java.net/jdk/pull/6378/files/fc99dc17..a43ce9aa Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6378&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6378&range=00-01 Stats: 57 lines in 1 file changed: 34 ins; 19 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6378.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6378/head:pull/6378 PR: https://git.openjdk.java.net/jdk/pull/6378 From ysuenaga at openjdk.java.net Mon Nov 15 13:55:35 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Mon, 15 Nov 2021 13:55:35 GMT Subject: RFR: 8277089: Use system binutils to build hsdis In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 10:23:48 GMT, Magnus Ihse Bursie wrote: >> hsdis requires binutils source tree for building. Most of Linux distros provide binutils package. (e.g. binutils-devel from Fedora, binutils-dev from Ubuntu) >> It would be nice to be able to use them like zlib and lcms. >> >> Unfortunately bfdver.h would not be provided because it is not included install files (`make install`) in binutils. So I changed to use `SEC_ELF_OCTETS` macro to detect binutils version because it was introduced at the same time as `bfd_octets_per_byte()`. >> >> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;f=bfd/bfd-in2.h;h=618265039f697eab9e72bb58b95fc2d32925df58 >> >> Please see [JDK-8244819](https://bugs.openjdk.java.net/browse/JDK-8244819) why we need version check. > > The basic idea is fine. I also think checking for `SEC_ELF_OCTETS` in the source code, instead of the version number, is actually an improvement. > > The one thing that itches me a bit is what happens when you specify `--with-binutils=system` and a dependent library is not found: > > > AC_CHECK_LIB(iberty, xmalloc, [ HSDIS_LIBS="$HSDIS_LIBS -liberty" ], [ AC_MSG_ERROR([libiberty not found]) ]) > > > Then the build will fail with no clear indication on why. Instead, I'd recommend that you restructure slightly. > > First check if with_binutils is system. If so, run your lib checks but like this: > > AC_CHECK_LIB(iberty, xmalloc, [ HSDIS_LIBS="$HSDIS_LIBS -liberty" ], [ bintils_system_error="libiberty not found" ]) > > > Then you go check the value of with_binutils again in the "switch" statement. And you can replace `AC_MSG_CHECKING` outside the switch statement again. If it is system, you check if `bintils_system_error` is non-empty. If so, you fail and explain that this-and-this error prevented system from working. > > What distributions have you tested this on? @magicus Thanks for your comment! I refactored this change. Could you review again? > What distributions have you tested this on? I tested this change on Fedora 35, and WSL Ubuntu 20.04 for Windows. ------------- PR: https://git.openjdk.java.net/jdk/pull/6378 From ihse at openjdk.java.net Mon Nov 15 14:34:36 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 15 Nov 2021 14:34:36 GMT Subject: RFR: 8277089: Use system binutils to build hsdis [v2] In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 13:19:56 GMT, Yasumasa Suenaga wrote: >> hsdis requires binutils source tree for building. Most of Linux distros provide binutils package. (e.g. binutils-devel from Fedora, binutils-dev from Ubuntu) >> It would be nice to be able to use them like zlib and lcms. >> >> Unfortunately bfdver.h would not be provided because it is not included install files (`make install`) in binutils. So I changed to use `SEC_ELF_OCTETS` macro to detect binutils version because it was introduced at the same time as `bfd_octets_per_byte()`. >> >> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;f=bfd/bfd-in2.h;h=618265039f697eab9e72bb58b95fc2d32925df58 >> >> Please see [JDK-8244819](https://bugs.openjdk.java.net/browse/JDK-8244819) why we need version check. > > Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: > > Refactoring Looks good to me now ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6378 From aph at openjdk.java.net Mon Nov 15 16:10:05 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 15 Nov 2021 16:10:05 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v4] In-Reply-To: References: Message-ID: > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: More cleanups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6280/files - new: https://git.openjdk.java.net/jdk/pull/6280/files/958f4a25..6e45c910 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=02-03 Stats: 21 lines in 3 files changed: 0 ins; 4 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/6280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6280/head:pull/6280 PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Mon Nov 15 16:19:05 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 15 Nov 2021 16:19:05 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v5] In-Reply-To: References: Message-ID: > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - More cleanups - More cleanups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6280/files - new: https://git.openjdk.java.net/jdk/pull/6280/files/6e45c910..e87eb674 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6280/head:pull/6280 PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Mon Nov 15 16:19:07 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 15 Nov 2021 16:19:07 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v3] In-Reply-To: References: <53rT9tbTqrs-GQyk56a3guueNHRVQC03hBje-eDB6JY=.8aef4f99-e1dd-4836-a5f2-e59f86ded3ee@github.com> Message-ID: <4Yt-_TFWqstkNCHtZnIH6tt-1VZVEt_c8irXsKRJcJM=.1c7dde3f-8f95-420d-bbad-a1ea5d832d8d@github.com> On Sat, 13 Nov 2021 17:13:54 GMT, Andrew Haley wrote: >> I understood the function as "return the functional name of the register, if there is one, otherwise return "r number". > > Yes, but the problem is that Register 31 has more than one name. I guess "r31" is the best we can do. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Mon Nov 15 16:19:10 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 15 Nov 2021 16:19:10 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v3] In-Reply-To: References: Message-ID: On Sat, 13 Nov 2021 09:46:48 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/register_aarch64.hpp line 63: >> >>> 61: // accessors >>> 62: int encoding() const { assert(is_valid(), "invalid register"); return encoding_nocheck(); } >>> 63: bool is_valid() const { return this >= first() && this < invalid(); } >> >> is_valid() now returns true for ZR, SP. Was this intended? This affects other functions too, e.g. `RegisterImpl::name()`. > > I think so, yes. SP and ZR are valid registers. `is_valid()` should only return `false` for `noreg`. Done. I decided to leave this as it is. `is_valid()` now returns true for only r0 ... r31. >> src/hotspot/cpu/aarch64/register_aarch64.hpp line 64: >> >>> 62: int encoding() const { assert(is_valid(), "invalid register"); return encoding_nocheck(); } >>> 63: bool is_valid() const { return this >= first() && this < invalid(); } >>> 64: bool has_byte_register() const { return is_valid(); } >> >> Same here, semantics changed to include 32 and 33 > > I guess it would be "safer" to go back to excluding SP and ZR, but it was never intentional to exclude them. I'll go through all the uses of is_valid() to see which is best. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Mon Nov 15 16:54:10 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 15 Nov 2021 16:54:10 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v6] In-Reply-To: References: Message-ID: <_Kq2AqIymFF5N42vkV67M-BrL6xGtKdEhSEKnP57E2E=.5c34beb4-7395-4de8-ba84-b1608f83c15f@github.com> > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: More cleanups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6280/files - new: https://git.openjdk.java.net/jdk/pull/6280/files/e87eb674..31347447 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=04-05 Stats: 4 lines in 2 files changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6280/head:pull/6280 PR: https://git.openjdk.java.net/jdk/pull/6280 From stuefe at openjdk.java.net Mon Nov 15 17:50:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 15 Nov 2021 17:50:41 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v6] In-Reply-To: <_Kq2AqIymFF5N42vkV67M-BrL6xGtKdEhSEKnP57E2E=.5c34beb4-7395-4de8-ba84-b1608f83c15f@github.com> References: <_Kq2AqIymFF5N42vkV67M-BrL6xGtKdEhSEKnP57E2E=.5c34beb4-7395-4de8-ba84-b1608f83c15f@github.com> Message-ID: On Mon, 15 Nov 2021 16:54:10 GMT, Andrew Haley wrote: >> The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. >> The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, >> >> >> typedef RegisterImpl *Register; >> const Register r10 = ((Register)10); >> >> >> Registers have accessors, e.g.: >> >> ` int RegisterImpl::encoding() const { return (intptr_t)this; }` >> >> This works by an accident of implementation: it is not legal C++. >> >> The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) >> >> >> extern RegisterImpl all_Registers[num_Registers]; >> int RegisterImpl::encoding() const { return this - all_Registers; } >> >> >> After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. >> >> An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: >> >> ` int RegisterImpl::encoding() const { return _encoding; }` >> >> This would result in smaller code, but I suspect slower. >> >> If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > More cleanups Looks good to me now. Cheers, Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6280 From duke at openjdk.java.net Mon Nov 15 19:24:00 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Mon, 15 Nov 2021 19:24:00 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) Message-ID: Refactor PredecessorValidator, more or less applying the following: declare variables where used redeclare instead of reuse variables move assert to a more logical place remove unused length variable inline variables where senseful split loops extract methods this is done in preparation for work on optimizing IR::verify. IR::verify calls PredecessorValidator. If the work of PredecessorValidator is made clearer, it will be easier to reason about where IR::verify doesn't need to be called (or where a subset of it would suffice). ------------- Commit messages: - extract methods - inline variable - deduplicate block_do - extract method - Init _blocks with expected size - Trivial polishing in PredecessorValidator Changes: https://git.openjdk.java.net/jdk/pull/6394/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6394&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277139 Stats: 75 lines in 1 file changed: 26 ins; 23 del; 26 mod Patch: https://git.openjdk.java.net/jdk/pull/6394.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6394/head:pull/6394 PR: https://git.openjdk.java.net/jdk/pull/6394 From duke at openjdk.java.net Mon Nov 15 19:24:01 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Mon, 15 Nov 2021 19:24:01 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) In-Reply-To: References: Message-ID: <4m3Kyrdg5jFD5HU4Fz5PNbqSgSjOn7bFA9Eydrp66Ns=.9c7940a3-d734-45da-bad1-f5d3987e9a4a@github.com> On Mon, 15 Nov 2021 18:26:36 GMT, Ludvig Janiuk wrote: > Refactor PredecessorValidator, more or less applying the following: > > declare variables where used > redeclare instead of reuse variables > move assert to a more logical place > remove unused length variable > inline variables where senseful > split loops > extract methods > > this is done in preparation for work on optimizing IR::verify. IR::verify calls PredecessorValidator. If the work of PredecessorValidator is made clearer, it will be easier to reason about where IR::verify doesn't need to be called (or where a subset of it would suffice). Tests https://mach5.us.oracle.com/mdash/jobs/opjaniuk-jdk-20211115-1831-26239975 ------------- PR: https://git.openjdk.java.net/jdk/pull/6394 From dlong at openjdk.java.net Mon Nov 15 21:32:58 2021 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 15 Nov 2021 21:32:58 GMT Subject: RFR: 8276231: ciReplay: SIGSEGV when replay compiling lambdas Message-ID: To fix the crash, we need to link the class before calling find_cached_constant_at(). I also fixed a confusing "line not properly terminated" error when we fail to load a class. ------------- Commit messages: - link class so that find_cached_constant_at does not crash Changes: https://git.openjdk.java.net/jdk/pull/6398/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6398&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276231 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6398.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6398/head:pull/6398 PR: https://git.openjdk.java.net/jdk/pull/6398 From iveresov at openjdk.java.net Mon Nov 15 22:12:37 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Mon, 15 Nov 2021 22:12:37 GMT Subject: RFR: 8276231: ciReplay: SIGSEGV when replay compiling lambdas In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 21:24:43 GMT, Dean Long wrote: > To fix the crash, we need to link the class before calling find_cached_constant_at(). I also fixed a confusing "line not properly terminated" error when we fail to load a class. Seems good. ------------- Marked as reviewed by iveresov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6398 From dlong at openjdk.java.net Mon Nov 15 22:28:32 2021 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 15 Nov 2021 22:28:32 GMT Subject: RFR: 8276231: ciReplay: SIGSEGV when replay compiling lambdas In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 22:09:21 GMT, Igor Veresov wrote: >> To fix the crash, we need to link the class before calling find_cached_constant_at(). I also fixed a confusing "line not properly terminated" error when we fail to load a class. > > Seems good. Thanks @veresov. ------------- PR: https://git.openjdk.java.net/jdk/pull/6398 From epavlova at openjdk.java.net Tue Nov 16 00:03:34 2021 From: epavlova at openjdk.java.net (Ekaterina Pavlova) Date: Tue, 16 Nov 2021 00:03:34 GMT Subject: RFR: 8276231: ciReplay: SIGSEGV when replay compiling lambdas In-Reply-To: References: Message-ID: <-PmEoAavjM7a8fNGwJVX7ZJcero0cfcTkFIeT27dGAw=.3bc47d43-9e1c-475e-9d6b-4d1c8eac747a@github.com> On Mon, 15 Nov 2021 21:24:43 GMT, Dean Long wrote: > To fix the crash, we need to link the class before calling find_cached_constant_at(). I also fixed a confusing "line not properly terminated" error when we fail to load a class. I tested test/jdk/java/lang/String with my ciReplay scripts and don't see these crashes anymore. Looks good from this testing point of view. ------------- PR: https://git.openjdk.java.net/jdk/pull/6398 From yyang at openjdk.java.net Tue Nov 16 02:22:46 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 16 Nov 2021 02:22:46 GMT Subject: RFR: 8277102: Dubious PrintCompilation output Message-ID: <4sIBYDkvLT3U7uu8rLoqqLGGAerrYy84Lu57Lx1uO-M=.10f22b0c-b67d-4a46-a503-5c1ffc409656@github.com> The output of PrintCompilation is ill-formed: 22 1 3 java.lang.Object:: (1 bytes) 25 2 3 java.lang.String::hashCode (60 bytes) 25 3 3 java.lang.String::coder (15 bytes) 27 4 3 Reduced::foo (12 bytes) 27 5 3 java.lang.Boolean::valueOf (14 bytes) 27 6 3 java.lang.Boolean::hashCode (8 bytes) 27 8 4 Reduced::foo (12 bytes) 27 7 2 java.lang.Boolean::hashCode (14 bytes) 4 3 Reduced::foo (12 bytes) made not entrant 29 9 % 3 Reduced::main @ 4 (33 bytes) 29 10 3 Reduced::main (33 bytes) 29 11 % 4 Reduced::main @ 4 (33 bytes) 9 % 3 Reduced::main @ 4 (33 bytes) made not entrant 11 % 4 Reduced::main @ 4 (33 bytes) made not entrant This seems related to [JDK-8272586](https://bugs.openjdk.java.net/browse/JDK-8272586), which print timestamp optionally. As #5446 mentioned, printing timestamp would break DisassembleCodeBlobTest.java since it expects disassembling a given nmethod twice to produce the same result. Maybe we should fix DisassembleCodeBlobTest.java. ------------- Commit messages: - 8277102 Dubious PrintCompilation output Changes: https://git.openjdk.java.net/jdk/pull/6386/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6386&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277102 Stats: 19 lines in 4 files changed: 9 ins; 2 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/6386.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6386/head:pull/6386 PR: https://git.openjdk.java.net/jdk/pull/6386 From duke at openjdk.java.net Tue Nov 16 07:14:35 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Tue, 16 Nov 2021 07:14:35 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) In-Reply-To: References: Message-ID: <-CbtsHm7edRM7QEF6OXtTpKvphFj5kQzfg6jtaTTP-k=.f66a8110-dc70-4d5f-b81c-cbd270f0497b@github.com> On Mon, 15 Nov 2021 18:26:36 GMT, Ludvig Janiuk wrote: > Refactor PredecessorValidator, more or less applying the following: > > declare variables where used > redeclare instead of reuse variables > move assert to a more logical place > remove unused length variable > inline variables where senseful > split loops > extract methods > > this is done in preparation for work on optimizing IR::verify. IR::verify calls PredecessorValidator. If the work of PredecessorValidator is made clearer, it will be easier to reason about where IR::verify doesn't need to be called (or where a subset of it would suffice). Tier1 and tier2 tests pass. ------------- PR: https://git.openjdk.java.net/jdk/pull/6394 From eosterlund at openjdk.java.net Tue Nov 16 09:06:56 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 16 Nov 2021 09:06:56 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 Message-ID: The C2 fast_lock and fast_unlock intrinsics don't support recursive ObjectMonitor locking. Some workloads can significantly benefit from this. Recent ObjectMonitor work has changed heuristics such that ObjectMonitors are deflated less aggressively. Therefore we can expect to see more inflated monitors in workloads where we would usually see more stack locks. That in itself is fine, except that C2 doesn't intrinsify the recursive locking paths for object monitors. Enabling those cases in the C2 code, removes a (~17%) regression we have seen with DaCapo h2 -t 1, and makes a few more benchmarks happy as well. ------------- Commit messages: - 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 Changes: https://git.openjdk.java.net/jdk/pull/6406/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6406&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277180 Stats: 47 lines in 2 files changed: 34 ins; 6 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/6406.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6406/head:pull/6406 PR: https://git.openjdk.java.net/jdk/pull/6406 From duke at openjdk.java.net Tue Nov 16 09:16:59 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Tue, 16 Nov 2021 09:16:59 GMT Subject: RFR: JDKJDK-8277178: Reduce the priority of data dependent nodes when OptoScheduling enabled Message-ID: when doing gcm/lcm, We should not only consider the height of nodes(latency), but also consider whether there is data dependency between nodes. When there is data dependency between two nodes and the delay of the previous node is large, another node without data dependency can be considered inserting between the two nodes. For example: for java code

    public static final double fval = 2.00;
    public static double[] A = new double[N];
    public static int[] B = new int[N];

    public static void testP(){
	for (int i=0; i
when use `-XX:+OptoScheduling` in aarch64, the sequence is

190     B15: #	out( B15 B16 ) <- in( B14 B15 ) Loop( B15-B15 inner main of N118 strip mined) Freq: 9.9999e+11
190     sxtw  R13, R15	# i2l
194 +   add R14, R17, R13, LShiftL #3	# ptr
198     ldrd  V16, [R14, #16]	# double
19c +   fmuld   V18, V16, V17
1a0 +   faddd   V16, V18, V16
1a4     strd  V16, [R14, #16]	# double
1a8 +   add R13, R0, R13, LShiftL #2	# ptr
1ac +   ldrw  R1, [R13, #16]	# int
1b0 +   addw  R14, R1, R1
1b4 +   addw R1, R14, #2
1b8 +   addw R15, R15, #1
1bc     strw  R1, [R13, #16]	# int
1c0 +   cmpw  R15, R12
1c4     blt B15 	// counted loop end  P=1.000000 C=40960.000000
Then a more efficient sequence should be:

190     B15: #	out( B15 B16 ) <- in( B14 B15 ) Loop( B15-B15 inner main of N118 strip mined) Freq: 9.9999e+11
190     sxtw  R13, R14	# i2l
194     add R15, R17, R13, LShiftL #3	# ptr
198     add R13, R0, R13, LShiftL #2	# ptr
19c     ldrd  V16, [R15, #16]	# double
1a0     ldrw  R2, [R13, #16]	# int
1a4     fmuld   V18, V16, V17
1a8     addw  R1, R2, R2
1ac     faddd   V16, V18, V16
1b0     strd  V16, [R15, #16]	# double
1b4     addw R1, R1, #2
1b8     strw  R1, [R13, #16]	# int
1bc     addw R14, R14, #1
1c0     cmpw  R14, R12
1c4     blt B15 	// counted loop end  P=1.000000 C=40960.000000
This problem also exists in MIPS architecture. This is a patch to fix this problem. Please help review it. Thanks ------------- Commit messages: - 8277178: Reduce the priority of data dependent nodes when OptoScheduling enabled Changes: https://git.openjdk.java.net/jdk/pull/6407/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6407&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277178 Stats: 41 lines in 2 files changed: 12 ins; 28 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6407.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6407/head:pull/6407 PR: https://git.openjdk.java.net/jdk/pull/6407 From thartmann at openjdk.java.net Tue Nov 16 12:54:40 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 16 Nov 2021 12:54:40 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v4] In-Reply-To: References: Message-ID: On Sat, 13 Nov 2021 05:22:07 GMT, Mai ??ng Qu?n Anh wrote: >> This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. >> >> In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request incrementally with two additional commits since the last revision: > > - add tests cover constant comparison and calling library > - add eq/ne, add correction test, refine micro Nice IR verification test. Your changes look good to me and all testing passed. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6101 From thartmann at openjdk.java.net Tue Nov 16 13:13:33 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 16 Nov 2021 13:13:33 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v2] In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 09:51:16 GMT, Takuya Kiriyama wrote: >> Could you please review the 8277042 code? >> This is the enhancement for 8276036. >> I add a new test to verify the value of full_count in the message of insufficient codecache. > > Takuya Kiriyama has updated the pull request incrementally with one additional commit since the last revision: > > 8277042: add test for 8276036 to compiler/codecache Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6364 From chagedorn at openjdk.java.net Tue Nov 16 13:19:37 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 16 Nov 2021 13:19:37 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v2] In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 09:51:16 GMT, Takuya Kiriyama wrote: >> Could you please review the 8277042 code? >> This is the enhancement for 8276036. >> I add a new test to verify the value of full_count in the message of insufficient codecache. > > Takuya Kiriyama has updated the pull request incrementally with one additional commit since the last revision: > > 8277042: add test for 8276036 to compiler/codecache I ran your test in our internal testing with latest JDK and found the following crash: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/opt/mach5/mesos/work_dir/slaves/ff806ead-2cac-495d-9cbc-62116f99bf14-S14125/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/928f8391-f7c3-41f7-944d-47d10f56f83d/runs/8280836e-2a4d-4ec6-8bf2-5d87f2a87640/workspace/open/src/hotspot/share/runtime/mutex.cpp:444), pid=16467, tid=16482 # assert(false) failed: Attempting to acquire lock CompileTask_lock/safepoint out of order with lock MethodCompileQueue_lock/safepoint -- possible deadlock # # JRE version: Java(TM) SE Runtime Environment (18.0) (fastdebug build 18-internal+0-2021-11-16-1108284.christian.hagedorn.jdk) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 18-internal+0-2021-11-16-1108284.christian.hagedorn.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1563e4d] Mutex::check_rank(Thread*)+0x29d I will have a closer look and get back to you again. It seems that your test revealed an existing bug. Please do not sponsor this PR, yet! ------------- PR: https://git.openjdk.java.net/jdk/pull/6364 From chagedorn at openjdk.java.net Tue Nov 16 13:58:34 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 16 Nov 2021 13:58:34 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v2] In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 09:51:16 GMT, Takuya Kiriyama wrote: >> Could you please review the 8277042 code? >> This is the enhancement for 8276036. >> I add a new test to verify the value of full_count in the message of insufficient codecache. > > Takuya Kiriyama has updated the pull request incrementally with one additional commit since the last revision: > > 8277042: add test for 8276036 to compiler/codecache I could also trigger this locally. I filed [JDK-8277213](https://bugs.openjdk.java.net/browse/JDK-8277213). This bug should be fixed before this test can be integrated to avoid noise in our testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6364 From duke at openjdk.java.net Tue Nov 16 14:00:40 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Tue, 16 Nov 2021 14:00:40 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v3] In-Reply-To: References: Message-ID: <5ok7QJdEz-iYq9SX_e4skYpQCXfmFx7emDEXV0Z7pSA=.066a0c06-e1f6-49d6-9042-cb653adce0cd@github.com> On Fri, 12 Nov 2021 17:30:52 GMT, Vladimir Kozlov wrote: >> Mai ??ng Qu?n Anh has updated the pull request incrementally with two additional commits since the last revision: >> >> - replace cmpx->Opcode() with cmpx_op >> - address reviews, remove checks for subtraction operatios > > Thank you for addressing my comments. > > I am tentatively approve these changes leaving final approval and testing to @TobiHartmann Thank @vnkozlov and @TobiHartmann for your reviews and suggestions, thank @DamonFool for your initial supports. May I have this PR sponsored, please? Thank you very much. ------------- PR: https://git.openjdk.java.net/jdk/pull/6101 From duke at openjdk.java.net Tue Nov 16 14:14:45 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Tue, 16 Nov 2021 14:14:45 GMT Subject: Integrated: 8276162: Optimise unsigned comparison pattern In-Reply-To: References: Message-ID: On Mon, 25 Oct 2021 10:15:42 GMT, Mai ??ng Qu?n Anh wrote: > This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. > > In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. > > Thank you very much. This pull request has now been integrated. Changeset: f3eb5014 Author: MeryKitty Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/f3eb5014aa75af4463308f52f2bc6e9fcd2da36c Stats: 523 lines in 3 files changed: 518 ins; 2 del; 3 mod 8276162: Optimise unsigned comparison pattern Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6101 From ysuenaga at openjdk.java.net Tue Nov 16 14:50:40 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Tue, 16 Nov 2021 14:50:40 GMT Subject: Integrated: 8277089: Use system binutils to build hsdis In-Reply-To: References: Message-ID: <3KlOI75YzKulauF-zC1WsUCeT1nmgSBAJAqPHAWdP3U=.bcd684d3-2680-47ec-a6aa-3a85846f47a3@github.com> On Sat, 13 Nov 2021 08:08:53 GMT, Yasumasa Suenaga wrote: > hsdis requires binutils source tree for building. Most of Linux distros provide binutils package. (e.g. binutils-devel from Fedora, binutils-dev from Ubuntu) > It would be nice to be able to use them like zlib and lcms. > > Unfortunately bfdver.h would not be provided because it is not included install files (`make install`) in binutils. So I changed to use `SEC_ELF_OCTETS` macro to detect binutils version because it was introduced at the same time as `bfd_octets_per_byte()`. > > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;f=bfd/bfd-in2.h;h=618265039f697eab9e72bb58b95fc2d32925df58 > > Please see [JDK-8244819](https://bugs.openjdk.java.net/browse/JDK-8244819) why we need version check. This pull request has now been integrated. Changeset: d5e47d6b Author: Yasumasa Suenaga URL: https://git.openjdk.java.net/jdk/commit/d5e47d6b84514edde23a8baff8c2274e5b3ca6bb Stats: 61 lines in 3 files changed: 45 ins; 12 del; 4 mod 8277089: Use system binutils to build hsdis Reviewed-by: ihse ------------- PR: https://git.openjdk.java.net/jdk/pull/6378 From chagedorn at openjdk.java.net Tue Nov 16 15:51:59 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 16 Nov 2021 15:51:59 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining Message-ID: This patch adds support to explicitly apply incremental inlining when replay compiling a method if the original compilation of the method was also incrementally inlined. We write a new value when dumping the inline tree to indicate if an inlinee was incrementally inlined (`= 1`) or not (`= 0`). To implement this, I updated the `REPLAY_VERSION` to 2 and additionally added a test to verify that old replay file versions are still working. I added some support to modify/remove version numbers of generated replay files in tests. I also refactored the test added by JDK-8275868 to reuse some of the methods. Thanks, Christian ------------- Commit messages: - adapt patch to use new replay file version number - 8254108: ciReplay: Support incremental inlining Changes: https://git.openjdk.java.net/jdk/pull/6413/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6413&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254108 Stats: 626 lines in 8 files changed: 452 ins; 132 del; 42 mod Patch: https://git.openjdk.java.net/jdk/pull/6413.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6413/head:pull/6413 PR: https://git.openjdk.java.net/jdk/pull/6413 From chagedorn at openjdk.java.net Tue Nov 16 15:53:47 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 16 Nov 2021 15:53:47 GMT Subject: RFR: 8276231: ciReplay: SIGSEGV when replay compiling lambdas In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 21:24:43 GMT, Dean Long wrote: > To fix the crash, we need to link the class before calling find_cached_constant_at(). I also fixed a confusing "line not properly terminated" error when we fail to load a class. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6398 From aph at openjdk.java.net Tue Nov 16 16:48:06 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 16 Nov 2021 16:48:06 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v7] In-Reply-To: References: Message-ID: > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Whitespace - Simplify and improve portability. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6280/files - new: https://git.openjdk.java.net/jdk/pull/6280/files/31347447..25700a0b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=05-06 Stats: 43 lines in 4 files changed: 13 ins; 14 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/6280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6280/head:pull/6280 PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Tue Nov 16 16:52:48 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 16 Nov 2021 16:52:48 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v7] In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 16:48:06 GMT, Andrew Haley wrote: >> The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. >> The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, >> >> >> typedef RegisterImpl *Register; >> const Register r10 = ((Register)10); >> >> >> Registers have accessors, e.g.: >> >> ` int RegisterImpl::encoding() const { return (intptr_t)this; }` >> >> This works by an accident of implementation: it is not legal C++. >> >> The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) >> >> >> extern RegisterImpl all_Registers[num_Registers]; >> int RegisterImpl::encoding() const { return this - all_Registers; } >> >> >> After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. >> >> An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: >> >> ` int RegisterImpl::encoding() const { return _encoding; }` >> >> This would result in smaller code, but I suspect slower. >> >> If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. > > Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: > > - Whitespace > - Simplify and improve portability. This has been a big cleanup, prompted by porting x86 to this scheme. There are fewer unnecessary changes, making this patch easier to review, and it's a bit more efficient too. Enormous thanks to my very patient reviewers. I think this is done now. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Tue Nov 16 17:21:05 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 16 Nov 2021 17:21:05 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v8] In-Reply-To: References: Message-ID: <82Q8vVWGc4pzGRoNY9uaPaOrZvIQb3OPnhC1rNhMYxY=.2b40eafc-90bb-43d5-ab2b-72617f88d851@github.com> > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Back out incorrect change to x86. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6280/files - new: https://git.openjdk.java.net/jdk/pull/6280/files/25700a0b..c7ec4ca9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6280/head:pull/6280 PR: https://git.openjdk.java.net/jdk/pull/6280 From dlong at openjdk.java.net Tue Nov 16 17:28:41 2021 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 16 Nov 2021 17:28:41 GMT Subject: RFR: 8276231: ciReplay: SIGSEGV when replay compiling lambdas In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 21:24:43 GMT, Dean Long wrote: > To fix the crash, we need to link the class before calling find_cached_constant_at(). I also fixed a confusing "line not properly terminated" error when we fail to load a class. Thanks Christian and Katya. ------------- PR: https://git.openjdk.java.net/jdk/pull/6398 From dlong at openjdk.java.net Tue Nov 16 17:28:41 2021 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 16 Nov 2021 17:28:41 GMT Subject: Integrated: 8276231: ciReplay: SIGSEGV when replay compiling lambdas In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 21:24:43 GMT, Dean Long wrote: > To fix the crash, we need to link the class before calling find_cached_constant_at(). I also fixed a confusing "line not properly terminated" error when we fail to load a class. This pull request has now been integrated. Changeset: e5ffdf91 Author: Dean Long URL: https://git.openjdk.java.net/jdk/commit/e5ffdf9120c14b38e4c8794888d2002e2686ebfc Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8276231: ciReplay: SIGSEGV when replay compiling lambdas Reviewed-by: iveresov, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/6398 From duke at openjdk.java.net Tue Nov 16 18:01:41 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Tue, 16 Nov 2021 18:01:41 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v7] In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 16:48:06 GMT, Andrew Haley wrote: >> The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. >> The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, >> >> >> typedef RegisterImpl *Register; >> const Register r10 = ((Register)10); >> >> >> Registers have accessors, e.g.: >> >> ` int RegisterImpl::encoding() const { return (intptr_t)this; }` >> >> This works by an accident of implementation: it is not legal C++. >> >> The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) >> >> >> extern RegisterImpl all_Registers[num_Registers]; >> int RegisterImpl::encoding() const { return this - all_Registers; } >> >> >> After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. >> >> An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: >> >> ` int RegisterImpl::encoding() const { return _encoding; }` >> >> This would result in smaller code, but I suspect slower. >> >> If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. > > Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: > > - Whitespace > - Simplify and improve portability. src/hotspot/cpu/aarch64/register_aarch64.hpp line 62: > 60: // accessors > 61: int encoding() const { assert(is_valid(), "invalid register"); return encoding_nocheck(); } > 62: bool is_valid() const { return this >= first() && this - first() < number_of_registers; } Some tiny suggestions, an unsigned comparison between `this - first()` and `number_of_registers` would be sufficient here. Suggestion: bool is_valid() const { (unsigned)(this - first()) < number_of_registers; } ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From duke at openjdk.java.net Tue Nov 16 18:07:40 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Tue, 16 Nov 2021 18:07:40 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v7] In-Reply-To: References: Message-ID: <7S0TmQeESOtjMsEWEnTv0x6AF6c0xHg0LmTXcSJrhYQ=.f6db48aa-993b-4530-b93a-932451f275c5@github.com> On Tue, 16 Nov 2021 16:48:06 GMT, Andrew Haley wrote: >> The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. >> The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, >> >> >> typedef RegisterImpl *Register; >> const Register r10 = ((Register)10); >> >> >> Registers have accessors, e.g.: >> >> ` int RegisterImpl::encoding() const { return (intptr_t)this; }` >> >> This works by an accident of implementation: it is not legal C++. >> >> The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) >> >> >> extern RegisterImpl all_Registers[num_Registers]; >> int RegisterImpl::encoding() const { return this - all_Registers; } >> >> >> After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. >> >> An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: >> >> ` int RegisterImpl::encoding() const { return _encoding; }` >> >> This would result in smaller code, but I suspect slower. >> >> If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. > > Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: > > - Whitespace > - Simplify and improve portability. src/hotspot/cpu/aarch64/register_aarch64.hpp line 154: > 152: > 153: // derived registers, offsets, and addresses > 154: FloatRegister successor() const { return as_FloatRegister((encoding() + 1) % 32); } Should this `32` be replaced by `number_of_registers`. Furthermore, an `&` would save some instructions here, a `static_assert` to ensure `number_of_registers` is a power of 2, too. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From duke at openjdk.java.net Tue Nov 16 18:11:42 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Tue, 16 Nov 2021 18:11:42 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v7] In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 16:48:06 GMT, Andrew Haley wrote: >> The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. >> The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, >> >> >> typedef RegisterImpl *Register; >> const Register r10 = ((Register)10); >> >> >> Registers have accessors, e.g.: >> >> ` int RegisterImpl::encoding() const { return (intptr_t)this; }` >> >> This works by an accident of implementation: it is not legal C++. >> >> The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) >> >> >> extern RegisterImpl all_Registers[num_Registers]; >> int RegisterImpl::encoding() const { return this - all_Registers; } >> >> >> After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. >> >> An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: >> >> ` int RegisterImpl::encoding() const { return _encoding; }` >> >> This would result in smaller code, but I suspect slower. >> >> If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. > > Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: > > - Whitespace > - Simplify and improve portability. src/hotspot/cpu/aarch64/register_aarch64.hpp line 265: > 263: int encoding_nocheck() const { return this - first(); } > 264: bool is_valid() const { return this >= first() && this - first() < number_of_registers; } > 265: bool is_governing() const { return first() <= this && this - first() < number_of_governing_registers; } You missed a space here. Cheers. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From dlong at openjdk.java.net Tue Nov 16 22:38:34 2021 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 16 Nov 2021 22:38:34 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 15:45:15 GMT, Christian Hagedorn wrote: > This patch adds support to explicitly apply incremental inlining when replay compiling a method if the original compilation of the method was also incrementally inlined. We write a new value when dumping the inline tree to indicate if an inlinee was incrementally inlined (`= 1`) or not (`= 0`). > > To implement this, I updated the `REPLAY_VERSION` to 2 and additionally added a test to verify that old replay file versions are still working. I added some support to modify/remove version numbers of generated replay files in tests. I also refactored the test added by JDK-8275868 to reuse some of the methods. > > Thanks, > Christian Changes requested by dlong (Reviewer). src/hotspot/share/ci/ciReplay.cpp line 775: > 773: // Pending exception? > 774: break; > 775: } I don't see how a pending exception is possible here, given the check at L763, and parse_int() doesn't throw any. What do you think about not calling parse_int() if _version < 2, that way there is no error to ignore? src/hotspot/share/opto/bytecodeInfo.cpp line 384: > 382: return false; > 383: } > 384: if (should_not_inline(callee_method, caller_method, caller_bci, NOT_PRODUCT_ARG(should_delay) profile)) { How about a comment for these two calls saying replay may override "should_delay"? src/hotspot/share/opto/bytecodeInfo.cpp line 609: > 607: InlineTree* callee_tree = build_inline_tree_for_callee(callee_method, jvms, caller_bci); > 608: if (should_delay || AlwaysIncrementalInline) { > 609: callee_tree->set_late_inline(); It took me a while to figure out why this is needed: for replay. It bothers me a little that AlwaysIncrementalInline is check here and again in the caller. If the replay file sets should_delay to false, then we shouldn't let AlwaysIncrementalInline to force it to true, right? So I'm wondering if it would be better to pre-set should_delay to true in the caller if AlwaysIncrementalInline is true. ------------- PR: https://git.openjdk.java.net/jdk/pull/6413 From duke at openjdk.java.net Wed Nov 17 00:30:56 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Wed, 17 Nov 2021 00:30:56 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v4] In-Reply-To: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: > After JDK-8269559 was integrated there are failures in tier1 testing > across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. > > This patch is NOT functional; rather, this tends to verify potential > toolchain issues as the original patch pass testing on other > platforms. > > In this patch, we remove new SVE-related matching rules and register > class introduced in the original patch to minimally affect the > non-SVE part. TatWai Chong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge master - Add the matching rule in td file, enable control path in the code stub. - Add the register class and description for this SVE intrinsic. - 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE After JDK-8269559 was integrated there are failures in tier1 testing across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. This patch isn't functional; rather, this tends to verify potential toolchain issues as the original patch passes testing on other platforms. In this patch, we remove new SVE-related matching rules and register class introduced in the original patch to minimally affect the non-SVE part. ------------- Changes: https://git.openjdk.java.net/jdk/pull/6072/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6072&range=03 Stats: 423 lines in 9 files changed: 412 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/6072.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6072/head:pull/6072 PR: https://git.openjdk.java.net/jdk/pull/6072 From duke at openjdk.java.net Wed Nov 17 01:53:43 2021 From: duke at openjdk.java.net (Fei Gao) Date: Wed, 17 Nov 2021 01:53:43 GMT Subject: RFR: 8275317: AArch64: Support some type conversion vectorization in SLP In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 03:39:42 GMT, Fei Gao wrote: > Current SLP vectorizer in C2 compiler doesn't support type conversion > operations. But AArch64 has vector type conversion instructions in > both NEON and SVE. > > The type conversion involves two kinds of scenarios, conversion between > the same data sizes and conversion between different data sizes. If we > want to support casts between different data sizes, we need to amend > the code part for identifying adjacent memory references and the code > part for justifying if the combination is profitable. I suppose it > would be easier to review if we split the whole task to support type > conversion into two separate patches, one for the same data sizes and > the other one for different data sizes. The goal of this patch is just > to support conversions within the same data size, including: > int -> float > float -> int > long -> double > double -> long > > A typical test case: > > for (int i = start; i < limit; i++) { > b[i] = (float) a[i]; > } > > To implement it, the patch completed the necessary instructions and > matching rules in the backend and added implementation for SLP in > the middle end. > > The percentage of performance uplift on aarch64 system: > Mode: avgt > Cnt: 15 > Metric: (ns/op) > > benchmark percentage change [(After-Before)/Before] > VectorLoop.convertD2L -48.46% > VectorLoop.convertF2I -55.67% > VectorLoop.convertI2F -55.27% > VectorLoop.convertL2D -48.75% Hi @nick-arm , can I have your review please? ------------- PR: https://git.openjdk.java.net/jdk/pull/6145 From dlong at openjdk.java.net Wed Nov 17 02:04:46 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 17 Nov 2021 02:04:46 GMT Subject: RFR: 8277310: ciReplay: @cpi MethodHandle references not resolved Message-ID: It turns out replay was resolving constant pool entries for "@bci " references, but not for "@cpi" references. These changes fix that. ------------- Commit messages: - actually resolve @cpi references Changes: https://git.openjdk.java.net/jdk/pull/6423/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6423&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277310 Stats: 60 lines in 1 file changed: 15 ins; 16 del; 29 mod Patch: https://git.openjdk.java.net/jdk/pull/6423.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6423/head:pull/6423 PR: https://git.openjdk.java.net/jdk/pull/6423 From ngasson at openjdk.java.net Wed Nov 17 03:56:32 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 17 Nov 2021 03:56:32 GMT Subject: RFR: 8275317: AArch64: Support some type conversion vectorization in SLP In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 03:39:42 GMT, Fei Gao wrote: > Current SLP vectorizer in C2 compiler doesn't support type conversion > operations. But AArch64 has vector type conversion instructions in > both NEON and SVE. > > The type conversion involves two kinds of scenarios, conversion between > the same data sizes and conversion between different data sizes. If we > want to support casts between different data sizes, we need to amend > the code part for identifying adjacent memory references and the code > part for justifying if the combination is profitable. I suppose it > would be easier to review if we split the whole task to support type > conversion into two separate patches, one for the same data sizes and > the other one for different data sizes. The goal of this patch is just > to support conversions within the same data size, including: > int -> float > float -> int > long -> double > double -> long > > A typical test case: > > for (int i = start; i < limit; i++) { > b[i] = (float) a[i]; > } > > To implement it, the patch completed the necessary instructions and > matching rules in the backend and added implementation for SLP in > the middle end. > > The percentage of performance uplift on aarch64 system: > Mode: avgt > Cnt: 15 > Metric: (ns/op) > > benchmark percentage change [(After-Before)/Before] > VectorLoop.convertD2L -48.46% > VectorLoop.convertF2I -55.67% > VectorLoop.convertI2F -55.27% > VectorLoop.convertL2D -48.75% Looks OK to me. ------------- Marked as reviewed by ngasson (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6145 From duke at openjdk.java.net Wed Nov 17 04:16:35 2021 From: duke at openjdk.java.net (Fei Gao) Date: Wed, 17 Nov 2021 04:16:35 GMT Subject: RFR: 8275317: AArch64: Support some type conversion vectorization in SLP In-Reply-To: References: Message-ID: <2rfJ99bxg6QbHg8wm9YwtOAyqxRFyRcjbCmG89YPvOo=.2587cbae-7c35-419b-899f-b5439fac575c@github.com> On Wed, 17 Nov 2021 03:53:35 GMT, Nick Gasson wrote: > Looks OK to me. Thanks, @nick-arm ------------- PR: https://git.openjdk.java.net/jdk/pull/6145 From duke at openjdk.java.net Wed Nov 17 06:09:40 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Wed, 17 Nov 2021 06:09:40 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> <3O9BeTBG4Z4q3up1VlKGb096qxN2dxSXxTG_FWrXNVE=.ab464e81-34ad-4e8f-aba4-543489015f64@github.com> Message-ID: On Fri, 12 Nov 2021 07:00:05 GMT, Tobias Hartmann wrote: >>> Now, only the `gtest/GTestWrapper.java` fails in `AssemblerAArch64::validate_vm`. >> >> That is still quite strange. I just tried `gtest/GTestWrapper.java` on that commit on an M1 Mac and it passed (both fastdebug and release build). Could you post the log of the failing test? > >> Could you post the log of the failing test? > > The log file is not too helpful, it just contains: > > > java.lang.AssertionError: gtest execution failed; exit code = 2. the failed tests: [AssemblerAArch64::validate_vm, ... AssemblerAArch64::validate_vm, AssemblerAArch64::validate_vm] > at GTestWrapper.main(GTestWrapper.java:98) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:577) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) > at java.base/java.lang.Thread.run(Thread.java:833) > > > Where the `AssemblerAArch64::validate_vm` is repeated many times. > >> @tatwaichong the `gtest/GTestWrapper.java` failure was on a re-test of your original commit, [8b1b6f9](https://github.com/openjdk/jdk/commit/8b1b6f9fb375bbc2de339ad8f526ca4d5f83dc70). This latest PR seems fine. > > Yes, exactly. > > As this change seems stable, I'm fine with integrating it. @TobiHartmann, @merykitty Hi, could you have a look at the failure of Linux x86 (hs/tier1 compiler) in compiler/c2/irTests/TestUnsignedComparison.java? It seems that failure isn't introduced by this patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From thartmann at openjdk.java.net Wed Nov 17 08:01:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 17 Nov 2021 08:01:42 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v4] In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: On Wed, 17 Nov 2021 00:30:56 GMT, TatWai Chong wrote: >> After JDK-8269559 was integrated there are failures in tier1 testing >> across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. >> >> This patch is NOT functional; rather, this tends to verify potential >> toolchain issues as the original patch pass testing on other >> platforms. >> >> In this patch, we remove new SVE-related matching rules and register >> class introduced in the original patch to minimally affect the >> non-SVE part. > > TatWai Chong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge master > - Add the matching rule in td file, enable control path in the code stub. > - Add the register class and description for this SVE intrinsic. > - 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE > > After JDK-8269559 was integrated there are failures in tier1 testing > across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. > > This patch isn't functional; rather, this tends to verify potential > toolchain issues as the original patch passes testing on other > platforms. > > In this patch, we remove new SVE-related matching rules and register > class introduced in the original patch to minimally affect the > non-SVE part. I had a look at the artifacts and the test fails because the VM crashes with: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/runner/work/jdk/jdk/jdk/src/hotspot/share/opto/matcher.cpp:1714), pid=36076, tid=36091 # assert(false) failed: bad AD file # # JRE version: OpenJDK Runtime Environment (18.0) (fastdebug build 18-internal+0-tatwaichong-009443ec6847694c2b0b0cd058dc1768dd5c5e34) # Java VM: OpenJDK Server VM (fastdebug 18-internal+0-tatwaichong-009443ec6847694c2b0b0cd058dc1768dd5c5e34, mixed mode, sharing, tiered, g1 gc, linux-x86) # Problematic frame: # V [libjvm.so+0x11bc845] Matcher::Label_Root(Node const*, State*, Node*, Node*&)+0x565 Current CompileTask: C2: 1067 644 b 4 compiler.c2.irTests.TestUnsignedComparison::testLongVarLT (20 bytes) Stack: [0x92f3f000,0x93000000], sp=0x92ffd130, free space=760k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x11bc845] Matcher::Label_Root(Node const*, State*, Node*, Node*&)+0x565 V [libjvm.so+0x11bd194] Matcher::match_tree(Node const*)+0x214 V [libjvm.so+0x11c8ee5] Matcher::xform(Node*, int)+0x1115 V [libjvm.so+0x11d2a65] Matcher::match()+0x37a5 V [libjvm.so+0x966a72] Compile::Code_Gen()+0xa2 V [libjvm.so+0x973a20] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x17f0 V [libjvm.so+0x799283] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x853 V [libjvm.so+0x98499b] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe6b V [libjvm.so+0x985a31] CompileBroker::compiler_thread_loop()+0x761 V [libjvm.so+0x9ae5fb] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x2b V [libjvm.so+0x16f60b2] JavaThread::thread_main_inner()+0x262 V [libjvm.so+0x16ff71e] Thread::call_run()+0xfe V [libjvm.so+0x1326943] thread_native_entry(Thread*)+0x123 C [libpthread.so.0+0x860a] start_thread+0xea That looks unrelated to this change, I filed [JDK-8277324](https://bugs.openjdk.java.net/browse/JDK-8277324). ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From ngasson at openjdk.java.net Wed Nov 17 08:12:36 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 17 Nov 2021 08:12:36 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 08:58:49 GMT, Erik ?sterlund wrote: > The C2 fast_lock and fast_unlock intrinsics don't support recursive ObjectMonitor locking. Some workloads can significantly benefit from this. Recent ObjectMonitor work has changed heuristics such that ObjectMonitors are deflated less aggressively. Therefore we can expect to see more inflated monitors in workloads where we would usually see more stack locks. That in itself is fine, except that C2 doesn't intrinsify the recursive locking paths for object monitors. Enabling those cases in the C2 code, removes a (~17%) regression we have seen with DaCapo h2 -t 1, and makes a few more benchmarks happy as well. src/hotspot/cpu/aarch64/aarch64.ad line 3924: > 3922: > 3923: __ cmp(disp_hdr, (u1)0); > 3924: __ br(Assembler::EQ, notRecursive); You can replace these two with a single `__ cbz(disp_hdr, notRecursive)` and avoid clobbering the flags. ------------- PR: https://git.openjdk.java.net/jdk/pull/6406 From thartmann at openjdk.java.net Wed Nov 17 08:14:49 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 17 Nov 2021 08:14:49 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v3] In-Reply-To: <5ok7QJdEz-iYq9SX_e4skYpQCXfmFx7emDEXV0Z7pSA=.066a0c06-e1f6-49d6-9042-cb653adce0cd@github.com> References: <5ok7QJdEz-iYq9SX_e4skYpQCXfmFx7emDEXV0Z7pSA=.066a0c06-e1f6-49d6-9042-cb653adce0cd@github.com> Message-ID: On Tue, 16 Nov 2021 13:57:49 GMT, Mai ??ng Qu?n Anh wrote: >> Thank you for addressing my comments. >> >> I am tentatively approve these changes leaving final approval and testing to @TobiHartmann > > Thank @vnkozlov and @TobiHartmann for your reviews and suggestions, thank @DamonFool for your initial supports. > May I have this PR sponsored, please? > > Thank you very much. This introduced a regression on 32-bit x86, @merykitty could you please have a look? [JDK-8277324](https://bugs.openjdk.java.net/browse/JDK-8277324) ------------- PR: https://git.openjdk.java.net/jdk/pull/6101 From duke at openjdk.java.net Wed Nov 17 08:23:39 2021 From: duke at openjdk.java.net (Fei Gao) Date: Wed, 17 Nov 2021 08:23:39 GMT Subject: Integrated: 8275317: AArch64: Support some type conversion vectorization in SLP In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 03:39:42 GMT, Fei Gao wrote: > Current SLP vectorizer in C2 compiler doesn't support type conversion > operations. But AArch64 has vector type conversion instructions in > both NEON and SVE. > > The type conversion involves two kinds of scenarios, conversion between > the same data sizes and conversion between different data sizes. If we > want to support casts between different data sizes, we need to amend > the code part for identifying adjacent memory references and the code > part for justifying if the combination is profitable. I suppose it > would be easier to review if we split the whole task to support type > conversion into two separate patches, one for the same data sizes and > the other one for different data sizes. The goal of this patch is just > to support conversions within the same data size, including: > int -> float > float -> int > long -> double > double -> long > > A typical test case: > > for (int i = start; i < limit; i++) { > b[i] = (float) a[i]; > } > > To implement it, the patch completed the necessary instructions and > matching rules in the backend and added implementation for SLP in > the middle end. > > The percentage of performance uplift on aarch64 system: > Mode: avgt > Cnt: 15 > Metric: (ns/op) > > benchmark percentage change [(After-Before)/Before] > VectorLoop.convertD2L -48.46% > VectorLoop.convertF2I -55.67% > VectorLoop.convertI2F -55.27% > VectorLoop.convertL2D -48.75% This pull request has now been integrated. Changeset: 9aa30de4 Author: Faye Gao Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/9aa30de4bb55357ddf0900e6103062f02e85753b Stats: 229 lines in 5 files changed: 224 ins; 0 del; 5 mod 8275317: AArch64: Support some type conversion vectorization in SLP Reviewed-by: thartmann, ngasson ------------- PR: https://git.openjdk.java.net/jdk/pull/6145 From eosterlund at openjdk.java.net Wed Nov 17 08:31:36 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 17 Nov 2021 08:31:36 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 07:46:15 GMT, Nick Gasson wrote: >> The C2 fast_lock and fast_unlock intrinsics don't support recursive ObjectMonitor locking. Some workloads can significantly benefit from this. Recent ObjectMonitor work has changed heuristics such that ObjectMonitors are deflated less aggressively. Therefore we can expect to see more inflated monitors in workloads where we would usually see more stack locks. That in itself is fine, except that C2 doesn't intrinsify the recursive locking paths for object monitors. Enabling those cases in the C2 code, removes a (~17%) regression we have seen with DaCapo h2 -t 1, and makes a few more benchmarks happy as well. > > src/hotspot/cpu/aarch64/aarch64.ad line 3924: > >> 3922: >> 3923: __ cmp(disp_hdr, (u1)0); >> 3924: __ br(Assembler::EQ, notRecursive); > > You can replace these two with a single `__ cbz(disp_hdr, notRecursive)` and avoid clobbering the flags. That is a good idea. BTW note that in the unlocking path for AArch64 there is an ownership check, while in the x86_64 code there is only a comment saying we definitely need one of those, but it doesn't actually check the owner. @dholmes-ora did some digging and it seems like this was previously controlled by some ancient sync flag that isn't around anymore. It would only exist to check for unbalanced JNI locking, and the JNI spec kind of says you shouldn't do that - that's a programmer error. So it seems like just not doing the ownership check is totally fine, and seems to yield 10% better performance in some workloads where there is contended locking. But I don't want to remove that check as part of this change - just something to keep in mind for a future RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/6406 From dlong at openjdk.java.net Wed Nov 17 09:12:58 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 17 Nov 2021 09:12:58 GMT Subject: RFR: 8277316: ciReplay: dump_replay_data is not thread-safe In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 09:05:14 GMT, Dean Long wrote: > Using a static buffer for the file name causes corrupted replay files. Fixed. I also changed the formatting for in the file name to use %d instead of %p, so it is consistently output in decimal instead of hex. ------------- PR: https://git.openjdk.java.net/jdk/pull/6426 From dlong at openjdk.java.net Wed Nov 17 09:12:58 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 17 Nov 2021 09:12:58 GMT Subject: RFR: 8277316: ciReplay: dump_replay_data is not thread-safe Message-ID: Using a static buffer for the file name causes corrupted replay files. Fixed. ------------- Commit messages: - use stack buffer instead of static buffer Changes: https://git.openjdk.java.net/jdk/pull/6426/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6426&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277316 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6426.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6426/head:pull/6426 PR: https://git.openjdk.java.net/jdk/pull/6426 From thartmann at openjdk.java.net Wed Nov 17 09:31:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 17 Nov 2021 09:31:42 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern [v4] In-Reply-To: References: Message-ID: On Sat, 13 Nov 2021 05:22:07 GMT, Mai ??ng Qu?n Anh wrote: >> This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. >> >> In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request incrementally with two additional commits since the last revision: > > - add tests cover constant comparison and calling library > - add eq/ne, add correction test, refine micro The problem is missing match rules on x86-32: https://github.com/openjdk/jdk/pull/6427 ------------- PR: https://git.openjdk.java.net/jdk/pull/6101 From thartmann at openjdk.java.net Wed Nov 17 09:35:47 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 17 Nov 2021 09:35:47 GMT Subject: RFR: 8277324: C2 compilation fails with "bad AD file" on x86-32 after JDK-8276162 due to missing match rule Message-ID: [JDK-8276162](https://bugs.openjdk.java.net/browse/JDK-8276162) introduced an optimization that creates `CMoveI (Bool (CmpUL ...) ...)` shapes but x86-32 misses the corresponding match rules in C2's backend. I also fixed two comments incorrectly referring to ints instead of ptrs. Thanks, Tobias ------------- Commit messages: - 8277324: C2 compilation fails with "bad AD file" on x86-32 after JDK-8276162 due to missing match rule Changes: https://git.openjdk.java.net/jdk/pull/6427/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6427&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277324 Stats: 62 lines in 1 file changed: 60 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6427.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6427/head:pull/6427 PR: https://git.openjdk.java.net/jdk/pull/6427 From rrich at openjdk.java.net Wed Nov 17 10:07:39 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Wed, 17 Nov 2021 10:07:39 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v2] In-Reply-To: References: Message-ID: <1f2njrBpRrF6Ks7OISqXs4IDHSBUXkaNDMOIVnZqRWM=.664ee7f3-f856-4efc-8116-4d11ef092945@github.com> On Wed, 10 Nov 2021 19:19:13 GMT, Roman Kennke wrote: >> The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. >> I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. >> >> The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [ ] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Verify monitors even in non-debug builds Wouldn't a (new) minimal multi-threaded test that explicitly sets UseHeavyMonitors be good and sufficient for this develop feature? ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From aph at openjdk.java.net Wed Nov 17 10:18:34 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Nov 2021 10:18:34 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 08:58:49 GMT, Erik ?sterlund wrote: > The C2 fast_lock and fast_unlock intrinsics don't support recursive ObjectMonitor locking. Some workloads can significantly benefit from this. Recent ObjectMonitor work has changed heuristics such that ObjectMonitors are deflated less aggressively. Therefore we can expect to see more inflated monitors in workloads where we would usually see more stack locks. That in itself is fine, except that C2 doesn't intrinsify the recursive locking paths for object monitors. Enabling those cases in the C2 code, removes a (~17%) regression we have seen with DaCapo h2 -t 1, and makes a few more benchmarks happy as well. Normally I would hate any code added to our hand-carved assembler sequences, but even I have to admit that this surprisingly simple addition is worthwhile. src/hotspot/cpu/aarch64/aarch64.ad line 3872: > 3870: __ ldr(tmp, Address(disp_hdr, ObjectMonitor::recursions_offset_in_bytes() - markWord::monitor_value)); > 3871: __ add(tmp, tmp, 1u); > 3872: __ str(tmp, Address(disp_hdr, ObjectMonitor::recursions_offset_in_bytes() - markWord::monitor_value)); Suggestion: __ increment(Address(disp_hdr, ObjectMonitor::recursions_offset_in_bytes() - markWord::monitor_value)); ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6406 From eosterlund at openjdk.java.net Wed Nov 17 10:36:09 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 17 Nov 2021 10:36:09 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 [v2] In-Reply-To: References: Message-ID: > The C2 fast_lock and fast_unlock intrinsics don't support recursive ObjectMonitor locking. Some workloads can significantly benefit from this. Recent ObjectMonitor work has changed heuristics such that ObjectMonitors are deflated less aggressively. Therefore we can expect to see more inflated monitors in workloads where we would usually see more stack locks. That in itself is fine, except that C2 doesn't intrinsify the recursive locking paths for object monitors. Enabling those cases in the C2 code, removes a (~17%) regression we have seen with DaCapo h2 -t 1, and makes a few more benchmarks happy as well. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: optimize AArch64 code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6406/files - new: https://git.openjdk.java.net/jdk/pull/6406/files/e6feec1e..7832a9f4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6406&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6406&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6406.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6406/head:pull/6406 PR: https://git.openjdk.java.net/jdk/pull/6406 From eosterlund at openjdk.java.net Wed Nov 17 10:43:35 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 17 Nov 2021 10:43:35 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 [v2] In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 10:13:19 GMT, Andrew Haley wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> optimize AArch64 code > > src/hotspot/cpu/aarch64/aarch64.ad line 3872: > >> 3870: __ ldr(tmp, Address(disp_hdr, ObjectMonitor::recursions_offset_in_bytes() - markWord::monitor_value)); >> 3871: __ add(tmp, tmp, 1u); >> 3872: __ str(tmp, Address(disp_hdr, ObjectMonitor::recursions_offset_in_bytes() - markWord::monitor_value)); > > Suggestion: > > __ increment(Address(disp_hdr, ObjectMonitor::recursions_offset_in_bytes() - markWord::monitor_value)); The increment macro doesn't seem to utilize the fact that 1u can be encoded as an immediate to the add instruction. So it seems to generate worse code here. I'm okay with changing to increment anyway if you prefer that. ------------- PR: https://git.openjdk.java.net/jdk/pull/6406 From aph at openjdk.java.net Wed Nov 17 10:49:36 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Nov 2021 10:49:36 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 [v2] In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 10:40:35 GMT, Erik ?sterlund wrote: > The increment macro doesn't seem to utilize the fact that 1u can be encoded as an immediate to the add instruction. Sure it does. Try it. If it doesn't, we'll change `increment()`! ?? ------------- PR: https://git.openjdk.java.net/jdk/pull/6406 From chagedorn at openjdk.java.net Wed Nov 17 10:49:35 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 17 Nov 2021 10:49:35 GMT Subject: RFR: 8277316: ciReplay: dump_replay_data is not thread-safe In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 09:05:14 GMT, Dean Long wrote: > Using a static buffer for the file name causes corrupted replay files. Fixed. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6426 From chagedorn at openjdk.java.net Wed Nov 17 10:58:39 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 17 Nov 2021 10:58:39 GMT Subject: RFR: 8277310: ciReplay: @cpi MethodHandle references not resolved In-Reply-To: References: Message-ID: <4E0sxur8I5SmWVkuMEuR-qIn8aF2MhQdouVZu_15AU8=.019ef56e-4983-4a25-a030-504b4fb20848@github.com> On Wed, 17 Nov 2021 01:57:55 GMT, Dean Long wrote: > It turns out replay was resolving constant pool entries for "@bci " references, but not for "@cpi" references. These changes fix that. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6423 From eosterlund at openjdk.java.net Wed Nov 17 11:19:09 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 17 Nov 2021 11:19:09 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 [v3] In-Reply-To: References: Message-ID: <4oucmwwERGx-tUr1PPfNHb7YMCHFR82eCxyEAga1CzA=.28af13c7-9a9f-489b-a598-de6fcc32828b@github.com> > The C2 fast_lock and fast_unlock intrinsics don't support recursive ObjectMonitor locking. Some workloads can significantly benefit from this. Recent ObjectMonitor work has changed heuristics such that ObjectMonitors are deflated less aggressively. Therefore we can expect to see more inflated monitors in workloads where we would usually see more stack locks. That in itself is fine, except that C2 doesn't intrinsify the recursive locking paths for object monitors. Enabling those cases in the C2 code, removes a (~17%) regression we have seen with DaCapo h2 -t 1, and makes a few more benchmarks happy as well. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: use increment macro ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6406/files - new: https://git.openjdk.java.net/jdk/pull/6406/files/7832a9f4..26e69b28 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6406&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6406&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6406.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6406/head:pull/6406 PR: https://git.openjdk.java.net/jdk/pull/6406 From eosterlund at openjdk.java.net Wed Nov 17 11:19:12 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 17 Nov 2021 11:19:12 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 [v3] In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 10:15:38 GMT, Andrew Haley wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> use increment macro > > Normally I would hate any code added to our hand-carved assembler sequences, but even I have to admit that this surprisingly simple addition is worthwhile. Thanks for the review @theRealAph and @nick-arm Think I need someone to review the x86 code as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/6406 From eosterlund at openjdk.java.net Wed Nov 17 11:19:14 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 17 Nov 2021 11:19:14 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 [v3] In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 10:46:08 GMT, Andrew Haley wrote: >> The increment macro doesn't seem to utilize the fact that 1u can be encoded as an immediate to the add instruction. So it seems to generate worse code here. I'm okay with changing to increment anyway if you prefer that. > >> The increment macro doesn't seem to utilize the fact that 1u can be encoded as an immediate to the add instruction. > > Sure it does. Try it. If it doesn't, we'll change `increment()`! ?? Oh yeah look at that. I disassembled it and it did the right thing. Thanks for the suggestion. ------------- PR: https://git.openjdk.java.net/jdk/pull/6406 From chagedorn at openjdk.java.net Wed Nov 17 11:19:34 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 17 Nov 2021 11:19:34 GMT Subject: RFR: 8277324: C2 compilation fails with "bad AD file" on x86-32 after JDK-8276162 due to missing match rule In-Reply-To: References: Message-ID: <3oVHrN1gBDp5SmB45hM3rGR7xhNg_i_EC0bc4ebZOyI=.58f4b349-511a-447d-a750-c03bb24c2753@github.com> On Wed, 17 Nov 2021 09:25:26 GMT, Tobias Hartmann wrote: > [JDK-8276162](https://bugs.openjdk.java.net/browse/JDK-8276162) introduced an optimization that creates `CMoveI (Bool (CmpUL ...) ...)` shapes but x86-32 misses the corresponding match rules in C2's backend. > > I also fixed two comments incorrectly referring to ints instead of ptrs. > > Thanks, > Tobias Otherwise, the fix looks good! src/hotspot/cpu/x86/x86_32.ad line 13160: > 13158: opcode(0x0F,0x40); > 13159: ins_encode( enc_cmov(cmp), RegReg( dst, src ) ); > 13160: ins_pipe( pipe_cmov_reg ); Since this code is shared with `cmovII_reg_LTGE`, we could directly use it with `expand`: expand %{ cmovII_reg_LTGE(cmp, flags, dst, src); %} Could also be done for the other cases. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6427 From roland at openjdk.java.net Wed Nov 17 12:23:03 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 17 Nov 2021 12:23:03 GMT Subject: RFR: 8275330: C2: assert(n->is_Root() || n->is_Region() || n->is_Phi() || n->is_MachMerge() || def_block->dominates(block)) failed: uses must be dominated by definitions Message-ID: This is similar to previous bugs where: - a cast/conv node captures a narrow type in a loop body because of a range check, - the range check is optimized out of the loop, pre/main/post loop are created - overunrolling causes the main loop to become unreachable (the range check, if still in the main loop, would fail), the cast transforms to top but c2 can't optimize the loop out This was fixed by adding predicates above the main loop. With this particular bug, the cast node is in the post loop. The fix I propose is to also add predicates above the post loop. There are a few locations in the code that cause a post loop to be added: either the initial post loop or some other post loops for vectorization support. I think the new predicates are needed in a all cases. To be able to add predicates at these different points in the optimization process, the new predicates are copied from the main loop predicates. I also delayed folding of Opaque4 nodes to macro expansion rather than post loop opts igvn. The reason for that is that I believe there's a risk that an Opaque4 is removed (that is replaced by its input 2) before its input 1 has a chance to constant fold. That wouldn't happen with a debug build because we leave the tests in (that is replace the Opaque4 node by its input 1) so that corner case is not properly tested currently. The reason for leaving the tests in was to sanity check that the tests are indeed correct. ------------- Commit messages: - fix Changes: https://git.openjdk.java.net/jdk/pull/6429/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6429&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275330 Stats: 189 lines in 8 files changed: 136 ins; 22 del; 31 mod Patch: https://git.openjdk.java.net/jdk/pull/6429.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6429/head:pull/6429 PR: https://git.openjdk.java.net/jdk/pull/6429 From thartmann at openjdk.java.net Wed Nov 17 12:41:08 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 17 Nov 2021 12:41:08 GMT Subject: RFR: 8277324: C2 compilation fails with "bad AD file" on x86-32 after JDK-8276162 due to missing match rule [v2] In-Reply-To: References: Message-ID: > [JDK-8276162](https://bugs.openjdk.java.net/browse/JDK-8276162) introduced an optimization that creates `CMoveI (Bool (CmpUL ...) ...)` shapes but x86-32 misses the corresponding match rules in C2's backend. > > I also fixed two comments incorrectly referring to ints instead of ptrs. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Use expand ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6427/files - new: https://git.openjdk.java.net/jdk/pull/6427/files/8b315cdc..1e55fd19 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6427&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6427&range=00-01 Stats: 34 lines in 1 file changed: 0 ins; 10 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/6427.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6427/head:pull/6427 PR: https://git.openjdk.java.net/jdk/pull/6427 From thartmann at openjdk.java.net Wed Nov 17 12:41:13 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 17 Nov 2021 12:41:13 GMT Subject: RFR: 8277324: C2 compilation fails with "bad AD file" on x86-32 after JDK-8276162 due to missing match rule [v2] In-Reply-To: <3oVHrN1gBDp5SmB45hM3rGR7xhNg_i_EC0bc4ebZOyI=.58f4b349-511a-447d-a750-c03bb24c2753@github.com> References: <3oVHrN1gBDp5SmB45hM3rGR7xhNg_i_EC0bc4ebZOyI=.58f4b349-511a-447d-a750-c03bb24c2753@github.com> Message-ID: On Wed, 17 Nov 2021 11:12:10 GMT, Christian Hagedorn wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Use expand > > src/hotspot/cpu/x86/x86_32.ad line 13160: > >> 13158: opcode(0x0F,0x40); >> 13159: ins_encode( enc_cmov(cmp), RegReg( dst, src ) ); >> 13160: ins_pipe( pipe_cmov_reg ); > > Since this code is shared with `cmovII_reg_LTGE`, we could directly use it with `expand`: > > expand %{ > cmovII_reg_LTGE(cmp, flags, dst, src); > %} > > Could also be done for the other cases. Thanks Christian, that's a good suggestion. I thought expand does not work because the argument types differ but it does. I updated the patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/6427 From chagedorn at openjdk.java.net Wed Nov 17 12:57:34 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 17 Nov 2021 12:57:34 GMT Subject: RFR: 8277324: C2 compilation fails with "bad AD file" on x86-32 after JDK-8276162 due to missing match rule [v2] In-Reply-To: References: <3oVHrN1gBDp5SmB45hM3rGR7xhNg_i_EC0bc4ebZOyI=.58f4b349-511a-447d-a750-c03bb24c2753@github.com> Message-ID: On Wed, 17 Nov 2021 12:37:13 GMT, Tobias Hartmann wrote: >> src/hotspot/cpu/x86/x86_32.ad line 13160: >> >>> 13158: opcode(0x0F,0x40); >>> 13159: ins_encode( enc_cmov(cmp), RegReg( dst, src ) ); >>> 13160: ins_pipe( pipe_cmov_reg ); >> >> Since this code is shared with `cmovII_reg_LTGE`, we could directly use it with `expand`: >> >> expand %{ >> cmovII_reg_LTGE(cmp, flags, dst, src); >> %} >> >> Could also be done for the other cases. > > Thanks Christian, that's a good suggestion. I thought expand does not work because the argument types differ but it does. I updated the patch. Looks good! AFAICT the two operands `flagsReg_(u)long_LTGE` only differ in their format string which is probably why it works. But maybe someone else can jump in here who knows the code better. ------------- PR: https://git.openjdk.java.net/jdk/pull/6427 From chagedorn at openjdk.java.net Wed Nov 17 12:57:33 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 17 Nov 2021 12:57:33 GMT Subject: RFR: 8277324: C2 compilation fails with "bad AD file" on x86-32 after JDK-8276162 due to missing match rule [v2] In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 12:41:08 GMT, Tobias Hartmann wrote: >> [JDK-8276162](https://bugs.openjdk.java.net/browse/JDK-8276162) introduced an optimization that creates `CMoveI (Bool (CmpUL ...) ...)` shapes but x86-32 misses the corresponding match rules in C2's backend. >> >> I also fixed two comments incorrectly referring to ints instead of ptrs. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Use expand Marked as reviewed by chagedorn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6427 From ihse at openjdk.java.net Wed Nov 17 13:20:35 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Wed, 17 Nov 2021 13:20:35 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Wed, 13 Oct 2021 00:00:22 GMT, Magnus Ihse Bursie wrote: > This patch expands the newly added system for hsdis backends to include LLVM. > > The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) > > Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. > > The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. Yeah bot, I'm still working on it. ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From jbhateja at openjdk.java.net Wed Nov 17 13:50:48 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 17 Nov 2021 13:50:48 GMT Subject: RFR: 8277239: SIGSEGV in vrshift_reg_maskedNode::emit Message-ID: Currently instruction selector differentiates between the two kinds of vector shift operations i.e. one with vector shift count and other with scalar shift count passed though LShiftCntV/RShiftCntV nodes by looking at the ideal opcode of shift count node. A more robust scheme is to set a flag over vector shift node if it has variable vector shift count and replace the opcode based check with flag based check in various shift instruction selection patterns. ------------- Commit messages: - 8277239: SIGSEGV in vrshift_reg_maskedNode::emit Changes: https://git.openjdk.java.net/jdk/pull/6431/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6431&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277239 Stats: 134 lines in 5 files changed: 73 ins; 3 del; 58 mod Patch: https://git.openjdk.java.net/jdk/pull/6431.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6431/head:pull/6431 PR: https://git.openjdk.java.net/jdk/pull/6431 From aph at openjdk.java.net Wed Nov 17 14:13:36 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Nov 2021 14:13:36 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v7] In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 17:58:22 GMT, Mai ??ng Qu?n Anh wrote: >> Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: >> >> - Whitespace >> - Simplify and improve portability. > > src/hotspot/cpu/aarch64/register_aarch64.hpp line 62: > >> 60: // accessors >> 61: int encoding() const { assert(is_valid(), "invalid register"); return encoding_nocheck(); } >> 62: bool is_valid() const { return this >= first() && this - first() < number_of_registers; } > > Some tiny suggestions, an unsigned comparison between `this - first()` and `number_of_registers` would be sufficient here. > Suggestion: > > bool is_valid() const { (unsigned)(this - first()) < number_of_registers; } OK. To my surprise, this really does generate better code. I've been a GCC maintainer for a very long time, and I could have sworn that we did this optimization in the last century. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From thartmann at openjdk.java.net Wed Nov 17 14:21:41 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 17 Nov 2021 14:21:41 GMT Subject: RFR: 8277316: ciReplay: dump_replay_data is not thread-safe In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 09:05:14 GMT, Dean Long wrote: > Using a static buffer for the file name causes corrupted replay files. Fixed. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6426 From thartmann at openjdk.java.net Wed Nov 17 14:23:40 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 17 Nov 2021 14:23:40 GMT Subject: RFR: 8277310: ciReplay: @cpi MethodHandle references not resolved In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 01:57:55 GMT, Dean Long wrote: > It turns out replay was resolving constant pool entries for "@bci " references, but not for "@cpi" references. These changes fix that. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6423 From aph at openjdk.java.net Wed Nov 17 14:23:40 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Nov 2021 14:23:40 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v7] In-Reply-To: <7S0TmQeESOtjMsEWEnTv0x6AF6c0xHg0LmTXcSJrhYQ=.f6db48aa-993b-4530-b93a-932451f275c5@github.com> References: <7S0TmQeESOtjMsEWEnTv0x6AF6c0xHg0LmTXcSJrhYQ=.f6db48aa-993b-4530-b93a-932451f275c5@github.com> Message-ID: On Tue, 16 Nov 2021 18:04:32 GMT, Mai ??ng Qu?n Anh wrote: >> Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: >> >> - Whitespace >> - Simplify and improve portability. > > src/hotspot/cpu/aarch64/register_aarch64.hpp line 154: > >> 152: >> 153: // derived registers, offsets, and addresses >> 154: FloatRegister successor() const { return as_FloatRegister((encoding() + 1) % 32); } > > Should this `32` be replaced by `number_of_registers`. Furthermore, an `&` would save some instructions here, a `static_assert` to ensure `number_of_registers` is a power of 2, too. I think I'll leave that optimization out, but I will change `32` to `number_of_registers`. `successor()` is used only once in release code, it's not worth optimizing. Also, that `% 32` is a kludge I'd like to get rid of, but this patch isn't supposed to affect anything but UB. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Wed Nov 17 14:35:11 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Nov 2021 14:35:11 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v9] In-Reply-To: References: Message-ID: <69byto1KlbJy9IA5Z6wV3nl8eQTQKyMAEg18sR6q2P8=.dbe9530e-cd25-4aa4-ad90-a66e91e95a0d@github.com> > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Tweako stuff. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6280/files - new: https://git.openjdk.java.net/jdk/pull/6280/files/c7ec4ca9..0ce81fd2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=07-08 Stats: 7 lines in 1 file changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/6280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6280/head:pull/6280 PR: https://git.openjdk.java.net/jdk/pull/6280 From rkennke at openjdk.java.net Wed Nov 17 15:01:00 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 17 Nov 2021 15:01:00 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v3] In-Reply-To: References: Message-ID: > The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. > I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. > > The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [ ] tier4 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Add run configuration using -XX:+UseHeavyMonitors to MapLoops test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6320/files - new: https://git.openjdk.java.net/jdk/pull/6320/files/49dbc146..6a419ca7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=01-02 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6320.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320 PR: https://git.openjdk.java.net/jdk/pull/6320 From epavlova at openjdk.java.net Wed Nov 17 16:05:45 2021 From: epavlova at openjdk.java.net (Ekaterina Pavlova) Date: Wed, 17 Nov 2021 16:05:45 GMT Subject: RFR: 8277316: ciReplay: dump_replay_data is not thread-safe In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 09:05:14 GMT, Dean Long wrote: > Using a static buffer for the file name causes corrupted replay files. Fixed. I did patch src/hotspot/share/ci/ciEnv.cpp yesterday the same way it is done in the PR (removed static) and did run the testing overnight. I don?t see ?Failed on unknown command? anymore. Btw, could it be there are some tools/sources which depend on replay_pid%p_compid%d.log format? ------------- PR: https://git.openjdk.java.net/jdk/pull/6426 From chagedorn at openjdk.java.net Wed Nov 17 16:17:21 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 17 Nov 2021 16:17:21 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining [v2] In-Reply-To: References: Message-ID: > This patch adds support to explicitly apply incremental inlining when replay compiling a method if the original compilation of the method was also incrementally inlined. We write a new value when dumping the inline tree to indicate if an inlinee was incrementally inlined (`= 1`) or not (`= 0`). > > To implement this, I updated the `REPLAY_VERSION` to 2 and additionally added a test to verify that old replay file versions are still working. I added some support to modify/remove version numbers of generated replay files in tests. I also refactored the test added by JDK-8275868 to reuse some of the methods. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: update comments, should_delay and parsing inline_late, fix test if run with -XX:+AlwaysIncrementalInline ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6413/files - new: https://git.openjdk.java.net/jdk/pull/6413/files/ea35d316..0f6b2048 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6413&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6413&range=00-01 Stats: 19 lines in 4 files changed: 5 ins; 7 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/6413.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6413/head:pull/6413 PR: https://git.openjdk.java.net/jdk/pull/6413 From chagedorn at openjdk.java.net Wed Nov 17 16:17:26 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 17 Nov 2021 16:17:26 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining [v2] In-Reply-To: References: Message-ID: <_V6mer5gF-ZHTU1Ux9Q0waIFMu9UwvWKOmPIuPHq6js=.4e15a326-6777-4cd9-8494-a4625ac5f2e6@github.com> On Tue, 16 Nov 2021 22:06:12 GMT, Dean Long wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments, should_delay and parsing inline_late, fix test if run with -XX:+AlwaysIncrementalInline > > src/hotspot/share/ci/ciReplay.cpp line 775: > >> 773: // Pending exception? >> 774: break; >> 775: } > > I don't see how a pending exception is possible here, given the check at L763, and parse_int() doesn't throw any. > What do you think about not calling parse_int() if _version < 2, that way there is no error to ignore? I think that's a better idea to not parse it at all now that we have version numbers available. > src/hotspot/share/opto/bytecodeInfo.cpp line 609: > >> 607: InlineTree* callee_tree = build_inline_tree_for_callee(callee_method, jvms, caller_bci); >> 608: if (should_delay || AlwaysIncrementalInline) { >> 609: callee_tree->set_late_inline(); > > It took me a while to figure out why this is needed: for replay. It bothers me a little that AlwaysIncrementalInline is check here and again in the caller. If the replay file sets should_delay to false, then we shouldn't let AlwaysIncrementalInline to force it to true, right? So I'm wondering if it would be better to pre-set should_delay to true in the caller if AlwaysIncrementalInline is true. I've added a comment to make it more clear. So, this code is only to record the late inlining decision to later dump it to the replay file. I think initializing `should_delay = AlwaysIncrementalInline` is a good idea. `should_delay` can only become true but not false anymore during normal compilation. But I think we need to leave `|| AlwaysIncrementalInline` in here https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L192 in case someone wants to replay compile with that flag even though the replay file recorded a different late inlining decision. I also fixed the test if run with `-XX:+AlwaysIncrementalInline`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6413 From coleenp at openjdk.java.net Wed Nov 17 17:02:39 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 17 Nov 2021 17:02:39 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v3] In-Reply-To: References: Message-ID: <9cN8QPNVLsL2Cs88kMjG9XYJuvbPtHKsqQ7sw8CbKws=.38e5ace7-e43f-4582-902e-3a6f5626e85d@github.com> On Wed, 17 Nov 2021 15:01:00 GMT, Roman Kennke wrote: >> The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. >> I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. >> >> The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [ ] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add run configuration using -XX:+UseHeavyMonitors to MapLoops test I'm happy you did this. test/jdk/java/util/concurrent/ConcurrentHashMap/MapLoops.java line 52: > 50: * @summary Exercise multithreaded maps, using only heavy monitors. > 51: * @library /test/lib > 52: * @run main/othervm/timeout=1600 -XX:+IgnoreUnrecognizedVMOptions -XX:+UseHeavyMonitors MapLoops Did you want to also add -XX:+VerifyHeavyMonitors to this test? ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6320 From psandoz at openjdk.java.net Wed Nov 17 17:26:35 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 17 Nov 2021 17:26:35 GMT Subject: RFR: 8277239: SIGSEGV in vrshift_reg_maskedNode::emit In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 13:44:51 GMT, Jatin Bhateja wrote: > Currently instruction selector differentiates between the two kinds of vector shift operations i.e. one with vector shift count and other with scalar shift count passed though LShiftCntV/RShiftCntV nodes by looking at the ideal opcode of shift count node. > > A more robust scheme is to set a flag over vector shift node if it has variable vector shift count and replace the opcode based check with flag based check in various shift instruction selection patterns. Running tests to verify. ------------- PR: https://git.openjdk.java.net/jdk/pull/6431 From dlong at openjdk.java.net Wed Nov 17 20:20:47 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 17 Nov 2021 20:20:47 GMT Subject: RFR: 8277310: ciReplay: @cpi MethodHandle references not resolved In-Reply-To: References: Message-ID: <7QJD4R0b2Zf6UKGWupXXiM65jbEWaoJ4pudls6oKjIU=.6c24f7fc-ee0a-4dac-88bf-d9afce9891b1@github.com> On Wed, 17 Nov 2021 01:57:55 GMT, Dean Long wrote: > It turns out replay was resolving constant pool entries for "@bci " references, but not for "@cpi" references. These changes fix that. Thanks, Christian and Tobias. ------------- PR: https://git.openjdk.java.net/jdk/pull/6423 From dlong at openjdk.java.net Wed Nov 17 20:20:47 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 17 Nov 2021 20:20:47 GMT Subject: Integrated: 8277310: ciReplay: @cpi MethodHandle references not resolved In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 01:57:55 GMT, Dean Long wrote: > It turns out replay was resolving constant pool entries for "@bci " references, but not for "@cpi" references. These changes fix that. This pull request has now been integrated. Changeset: 8881f29b Author: Dean Long URL: https://git.openjdk.java.net/jdk/commit/8881f29bc83336bcbc0e8ff0cf1d2bbe55172f5c Stats: 60 lines in 1 file changed: 15 ins; 16 del; 29 mod 8277310: ciReplay: @cpi MethodHandle references not resolved Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6423 From dlong at openjdk.java.net Wed Nov 17 20:29:45 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 17 Nov 2021 20:29:45 GMT Subject: RFR: 8277316: ciReplay: dump_replay_data is not thread-safe In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 16:02:50 GMT, Ekaterina Pavlova wrote: >> Using a static buffer for the file name causes corrupted replay files. Fixed. > > Btw, could it be there are some tools/sources which depend on replay_pid%p_compid%d.log format? Thanks @katyapav for testing this. > Btw, could it be there are some tools/sources which depend on replay_pid%p_compid%d.log format? It's possible, but none that I am aware of. We already use a decimal for the replay file when there is a crash. ------------- PR: https://git.openjdk.java.net/jdk/pull/6426 From dlong at openjdk.java.net Wed Nov 17 20:29:45 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 17 Nov 2021 20:29:45 GMT Subject: RFR: 8277316: ciReplay: dump_replay_data is not thread-safe In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 09:05:14 GMT, Dean Long wrote: > Using a static buffer for the file name causes corrupted replay files. Fixed. Thanks Tobias and Christian! ------------- PR: https://git.openjdk.java.net/jdk/pull/6426 From dlong at openjdk.java.net Wed Nov 17 20:29:46 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 17 Nov 2021 20:29:46 GMT Subject: Integrated: 8277316: ciReplay: dump_replay_data is not thread-safe In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 09:05:14 GMT, Dean Long wrote: > Using a static buffer for the file name causes corrupted replay files. Fixed. This pull request has now been integrated. Changeset: d8c02802 Author: Dean Long URL: https://git.openjdk.java.net/jdk/commit/d8c0280273fa9f8e113088d6a43a4af076cd4f87 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod 8277316: ciReplay: dump_replay_data is not thread-safe Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6426 From dlong at openjdk.java.net Wed Nov 17 20:44:43 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 17 Nov 2021 20:44:43 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining [v2] In-Reply-To: <_V6mer5gF-ZHTU1Ux9Q0waIFMu9UwvWKOmPIuPHq6js=.4e15a326-6777-4cd9-8494-a4625ac5f2e6@github.com> References: <_V6mer5gF-ZHTU1Ux9Q0waIFMu9UwvWKOmPIuPHq6js=.4e15a326-6777-4cd9-8494-a4625ac5f2e6@github.com> Message-ID: On Wed, 17 Nov 2021 16:12:34 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/bytecodeInfo.cpp line 609: >> >>> 607: InlineTree* callee_tree = build_inline_tree_for_callee(callee_method, jvms, caller_bci); >>> 608: if (should_delay || AlwaysIncrementalInline) { >>> 609: callee_tree->set_late_inline(); >> >> It took me a while to figure out why this is needed: for replay. It bothers me a little that AlwaysIncrementalInline is check here and again in the caller. If the replay file sets should_delay to false, then we shouldn't let AlwaysIncrementalInline to force it to true, right? So I'm wondering if it would be better to pre-set should_delay to true in the caller if AlwaysIncrementalInline is true. > > I've added a comment to make it more clear. So, this code is only to record the late inlining decision to later dump it to the replay file. I think initializing `should_delay = AlwaysIncrementalInline` is a good idea. `should_delay` can only become true but not false anymore during normal compilation. > > But I think we need to leave `|| AlwaysIncrementalInline` in here https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L192 in case someone wants to replay compile with that flag even though the replay file recorded a different late inlining decision. > > I also fixed the test if run with `-XX:+AlwaysIncrementalInline`. Shouldn't the recorded inlining decision always override flags like -XX:+AlwaysIncrementalInline? This brings up the question of how to handle flags. If we stored them in the replay file, then the replay compile could compare those to the current flags and if they don't match: 1) give a warning and continue, correct replay not guaranteed 2) give an error and refuse to continue 3) override current flags with saved flags (this could be implemented by having the "ci" layer cache flags settings for each compile) ------------- PR: https://git.openjdk.java.net/jdk/pull/6413 From dlong at openjdk.java.net Wed Nov 17 20:44:42 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 17 Nov 2021 20:44:42 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining [v2] In-Reply-To: References: Message-ID: <3Wp04_NydzNWhLjDey3X6j0ZbCbLsPrEW_kNDY-d_A0=.2c229407-33d4-479d-8e0b-aa31d96f9c8a@github.com> On Wed, 17 Nov 2021 16:17:21 GMT, Christian Hagedorn wrote: >> This patch adds support to explicitly apply incremental inlining when replay compiling a method if the original compilation of the method was also incrementally inlined. We write a new value when dumping the inline tree to indicate if an inlinee was incrementally inlined (`= 1`) or not (`= 0`). >> >> To implement this, I updated the `REPLAY_VERSION` to 2 and additionally added a test to verify that old replay file versions are still working. I added some support to modify/remove version numbers of generated replay files in tests. I also refactored the test added by JDK-8275868 to reuse some of the methods. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > update comments, should_delay and parsing inline_late, fix test if run with -XX:+AlwaysIncrementalInline Changes requested by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6413 From rkennke at openjdk.java.net Wed Nov 17 20:50:15 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 17 Nov 2021 20:50:15 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v4] In-Reply-To: References: Message-ID: > The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. > I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. > > The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [ ] tier4 Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Fix formatting - Keep UseHeavyMonitors as release flag, but deprecate it ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6320/files - new: https://git.openjdk.java.net/jdk/pull/6320/files/6a419ca7..818468e7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=02-03 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6320.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320 PR: https://git.openjdk.java.net/jdk/pull/6320 From coleenp at openjdk.java.net Wed Nov 17 22:24:46 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 17 Nov 2021 22:24:46 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v4] In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 22:21:12 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix formatting >> - Keep UseHeavyMonitors as release flag, but deprecate it > > src/hotspot/share/runtime/globals.hpp line 1072: > >> 1070: \ >> 1071: product(bool, UseHeavyMonitors, false, \ >> 1072: "use heavyweight instead of lightweight Java monitors") \ > > For deprecated flags, make the description: > "(Deprecated) Use heavyweight instead of lightweight Java monitors" Also there's some test somewhere that you have to add this flag to. @dholmes-ora would know where that is. ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From coleenp at openjdk.java.net Wed Nov 17 22:24:45 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 17 Nov 2021 22:24:45 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v4] In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 20:50:15 GMT, Roman Kennke wrote: >> The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. >> I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. >> >> The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [ ] tier4 > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Fix formatting > - Keep UseHeavyMonitors as release flag, but deprecate it src/hotspot/share/runtime/globals.hpp line 1072: > 1070: \ > 1071: product(bool, UseHeavyMonitors, false, \ > 1072: "use heavyweight instead of lightweight Java monitors") \ For deprecated flags, make the description: "(Deprecated) Use heavyweight instead of lightweight Java monitors" ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From dlong at openjdk.java.net Thu Nov 18 00:26:37 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 18 Nov 2021 00:26:37 GMT Subject: RFR: 8266368: Inaccurate after_unwind hook in C2 exception handler In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 08:42:32 GMT, Erik ?sterlund wrote: > When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C. > The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C. Looks good! ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6405 From ngasson at openjdk.java.net Thu Nov 18 01:32:45 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 18 Nov 2021 01:32:45 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 [v3] In-Reply-To: <4oucmwwERGx-tUr1PPfNHb7YMCHFR82eCxyEAga1CzA=.28af13c7-9a9f-489b-a598-de6fcc32828b@github.com> References: <4oucmwwERGx-tUr1PPfNHb7YMCHFR82eCxyEAga1CzA=.28af13c7-9a9f-489b-a598-de6fcc32828b@github.com> Message-ID: On Wed, 17 Nov 2021 11:19:09 GMT, Erik ?sterlund wrote: >> The C2 fast_lock and fast_unlock intrinsics don't support recursive ObjectMonitor locking. Some workloads can significantly benefit from this. Recent ObjectMonitor work has changed heuristics such that ObjectMonitors are deflated less aggressively. Therefore we can expect to see more inflated monitors in workloads where we would usually see more stack locks. That in itself is fine, except that C2 doesn't intrinsify the recursive locking paths for object monitors. Enabling those cases in the C2 code, removes a (~17%) regression we have seen with DaCapo h2 -t 1, and makes a few more benchmarks happy as well. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > use increment macro AArch64 changes LGTM. ------------- Marked as reviewed by ngasson (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6406 From duke at openjdk.java.net Thu Nov 18 03:50:12 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Thu, 18 Nov 2021 03:50:12 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v5] In-Reply-To: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: > After JDK-8269559 was integrated there are failures in tier1 testing > across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. > > This patch is NOT functional; rather, this tends to verify potential > toolchain issues as the original patch pass testing on other > platforms. > > In this patch, we remove new SVE-related matching rules and register > class introduced in the original patch to minimally affect the > non-SVE part. TatWai Chong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' of https://git.openjdk.java.net/jdk into sve_compareto_redo - Merge master - Add the matching rule in td file, enable control path in the code stub. - Add the register class and description for this SVE intrinsic. - 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE After JDK-8269559 was integrated there are failures in tier1 testing across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. This patch isn't functional; rather, this tends to verify potential toolchain issues as the original patch passes testing on other platforms. In this patch, we remove new SVE-related matching rules and register class introduced in the original patch to minimally affect the non-SVE part. ------------- Changes: https://git.openjdk.java.net/jdk/pull/6072/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6072&range=04 Stats: 423 lines in 9 files changed: 412 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/6072.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6072/head:pull/6072 PR: https://git.openjdk.java.net/jdk/pull/6072 From xliu at openjdk.java.net Thu Nov 18 06:20:56 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 18 Nov 2021 06:20:56 GMT Subject: RFR: 8274983: Pattern.matcher performance regression after JDK-8238358 Message-ID: The root cause of the C1 regression is that regex generates multiple classes which all implement an interface. In SlowStartupTest.java, the follwoing call happens frequently with different receivers. 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z This patch allows c1 to generate the optimized virtual call for invokeinterface whose targets are the private interface methods. Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, it is possible that they trash the IC stub using their own concrete klass in runtime. Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. Therefore, this patch can prevent the callsite from trashing. ------------- Commit messages: - 8274983: Pattern.matcher performance regression after JDK-823835 Changes: https://git.openjdk.java.net/jdk/pull/6445/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6445&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274983 Stats: 23 lines in 3 files changed: 14 ins; 5 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6445.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6445/head:pull/6445 PR: https://git.openjdk.java.net/jdk/pull/6445 From stuefe at openjdk.java.net Thu Nov 18 06:46:43 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 18 Nov 2021 06:46:43 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v9] In-Reply-To: <69byto1KlbJy9IA5Z6wV3nl8eQTQKyMAEg18sR6q2P8=.dbe9530e-cd25-4aa4-ad90-a66e91e95a0d@github.com> References: <69byto1KlbJy9IA5Z6wV3nl8eQTQKyMAEg18sR6q2P8=.dbe9530e-cd25-4aa4-ad90-a66e91e95a0d@github.com> Message-ID: On Wed, 17 Nov 2021 14:35:11 GMT, Andrew Haley wrote: >> The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. >> The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, >> >> >> typedef RegisterImpl *Register; >> const Register r10 = ((Register)10); >> >> >> Registers have accessors, e.g.: >> >> ` int RegisterImpl::encoding() const { return (intptr_t)this; }` >> >> This works by an accident of implementation: it is not legal C++. >> >> The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) >> >> >> extern RegisterImpl all_Registers[num_Registers]; >> int RegisterImpl::encoding() const { return this - all_Registers; } >> >> >> After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. >> >> An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: >> >> ` int RegisterImpl::encoding() const { return _encoding; }` >> >> This would result in smaller code, but I suspect slower. >> >> If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Tweako stuff. I was confused that there were no x86 changes, or is that part of a future RFE? Mostly nits and questions remain. Cheers, Thomas src/hotspot/cpu/aarch64/register_aarch64.hpp line 56: > 54: > 55: // construction > 56: inline friend constexpr Register as_Register(int encoding); This is getting bikesheddy so feel free to ignore: Instead if using friend I would probably either just make `first()` public. Or, make as_xxx a static class method and wrap it with a global scope wrapper like this: --- a/src/hotspot/cpu/aarch64/register_aarch64.hpp +++ b/src/hotspot/cpu/aarch64/register_aarch64.hpp @@ -53,7 +53,8 @@ public: const Register successor() const { return this + 1; } // construction - inline friend constexpr Register as_Register(int encoding); + static constexpr Register as(int encoding); + diff --git a/src/hotspot/share/asm/register.hpp b/src/hotspot/share/asm/register.hpp index 06a8735f520..7d9036b0ff6 100644 --- a/src/hotspot/share/asm/register.hpp +++ b/src/hotspot/share/asm/register.hpp @@ -59,9 +59,12 @@ enum { name##_##type##EnumValue = (value) } #else // USE_POINTERS_TO_REGISTER_IMPL_ARRAY #define REGISTER_IMPL_DECLARATION(type, impl_type, reg_count) \ -inline constexpr type as_ ## type(int encoding) { \ +inline constexpr type impl_type::as(int encoding) { \ return impl_type::first() + encoding; \ } \ +inline constexpr type as_ ## type(int encoding) { \ + return impl_type::as(encoding); \ +} \ extern impl_type all_ ## type ## s[reg_count + 1] INTERNAL_VISIBILITY; \ inline constexpr type impl_type::first() { return all_ ## type ## s + 1; } src/hotspot/cpu/aarch64/register_aarch64.hpp line 63: > 61: int encoding() const { assert(is_valid(), "invalid register"); return encoding_nocheck(); } > 62: bool is_valid() const { return (unsigned)encoding_nocheck() < number_of_registers; } > 63: bool has_byte_register() const { return this >= first() && this - first() < number_of_byte_registers; } Why not relegate to encoding_nocheck() too: `return encoding_nocheck() >= 0 && encoding_nocheck() < num_byte_regs` ? src/hotspot/cpu/aarch64/register_aarch64.hpp line 156: > 154: FloatRegister successor() const { > 155: return as_FloatRegister((encoding() + 1) % (unsigned)number_of_registers); > 156: } Different from the other two, why? If we need validity checks here, should we not do them with the other types too? ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From dholmes at openjdk.java.net Thu Nov 18 06:50:39 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 06:50:39 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v4] In-Reply-To: References: Message-ID: <7GuyRoJ653qrQDv-vEnRU7JMcZU6qZJi0j7Ty1b5PE4=.c7d00b3d-c23b-47f9-bfb6-258623c2faae@github.com> On Wed, 17 Nov 2021 20:50:15 GMT, Roman Kennke wrote: >> The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. >> I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. >> >> The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [ ] tier4 > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Fix formatting > - Keep UseHeavyMonitors as release flag, but deprecate it HI Roman, I have a number of initial comments/suggestions/requests - see below. IIUC you are only making UseHeavyMonitors work properly on x86_64, but in that case you cannot convert UseFastLocks to UseHeavyMonitors on all platforms as it won't work correctly on those other platforms. Cheers, David src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 487: > 485: > 486: #if INCLUDE_RTM_OPT > 487: if (!UseHeavyMonitors && UseRTMForStackLocks && use_rtm) { Rather than do this shouldn't `UseHeavyMonitors` and `UseRTMForStackLocks` be mutually exclusive flags, checked in arguments.cpp? src/hotspot/share/runtime/arguments.cpp line 531: > 529: { "FilterSpuriousWakeups", JDK_Version::jdk(18), JDK_Version::jdk(19), JDK_Version::jdk(20) }, > 530: { "MinInliningThreshold", JDK_Version::jdk(18), JDK_Version::jdk(19), JDK_Version::jdk(20) }, > 531: { "UseHeavyMonitors", JDK_Version::jdk(18), JDK_Version::jdk(19), JDK_Version::jdk(20) }, Per my CSR request comment this needs to be product only code. src/hotspot/share/runtime/synchronizer.cpp line 442: > 440: // Fall through to inflate() ... > 441: } else if (mark.has_locker() && > 442: current->is_lock_owned((address)mark.locker())) { indent for expression continuation is wrong src/hotspot/share/runtime/synchronizer.cpp line 816: > 814: intptr_t hash; > 815: markWord mark = read_stable_mark(obj); > 816: if (UseHeavyMonitors && VerifyHeavyMonitors) { VerifyHeavyMonitors should require that UseHeavyMonitors be set. There should be logic in arguments.cpp to check that. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6320 From dholmes at openjdk.java.net Thu Nov 18 06:50:39 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 06:50:39 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v4] In-Reply-To: References: Message-ID: <3F9reZOh6W87DLW0NA7rqD8q7gBEbgG-fqoOdH2AZX0=.82b8c1e5-80f0-453d-9999-33f909527eb2@github.com> On Wed, 17 Nov 2021 22:21:53 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/globals.hpp line 1072: >> >>> 1070: \ >>> 1071: product(bool, UseHeavyMonitors, false, \ >>> 1072: "use heavyweight instead of lightweight Java monitors") \ >> >> For deprecated flags, make the description: >> "(Deprecated) Use heavyweight instead of lightweight Java monitors" > > Also there's some test somewhere that you have to add this flag to. @dholmes-ora would know where that is. `./runtime/CommandLine/VMDeprecatedOptions.java` :) ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From dholmes at openjdk.java.net Thu Nov 18 06:50:40 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 06:50:40 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v3] In-Reply-To: <9cN8QPNVLsL2Cs88kMjG9XYJuvbPtHKsqQ7sw8CbKws=.38e5ace7-e43f-4582-902e-3a6f5626e85d@github.com> References: <9cN8QPNVLsL2Cs88kMjG9XYJuvbPtHKsqQ7sw8CbKws=.38e5ace7-e43f-4582-902e-3a6f5626e85d@github.com> Message-ID: On Wed, 17 Nov 2021 16:59:22 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Add run configuration using -XX:+UseHeavyMonitors to MapLoops test > > test/jdk/java/util/concurrent/ConcurrentHashMap/MapLoops.java line 52: > >> 50: * @summary Exercise multithreaded maps, using only heavy monitors. >> 51: * @library /test/lib >> 52: * @run main/othervm/timeout=1600 -XX:+IgnoreUnrecognizedVMOptions -XX:+UseHeavyMonitors MapLoops > > Did you want to also add -XX:+VerifyHeavyMonitors to this test? If UseHeavyMonitors only works properly on x86_64 you will need an `@requires` to restrict this test run to that platform. ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From pli at openjdk.java.net Thu Nov 18 06:58:40 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Thu, 18 Nov 2021 06:58:40 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li wrote: > Arraycopy partial inlining is a C2 compiler technique that avoids stub > call overhead in small-sized arraycopy operations by generating masked > vector instructions. So far it works on x86 AVX512 only and this patch > enables it on AArch64 with SVE. > > We add AArch64 matching rule for VectorMaskGenNode and refactor that > node a little bit. The major change is moving the element type field > into its TypeVectMask bottom type. The reason is that AArch64 vector > masks are different for different vector element types. > > E.g., an x86 AVX512 vector mask value masking 3 least significant vector > lanes (of any type) is like > > `0000 0000 ... 0000 0000 0000 0000 0111` > > On AArch64 SVE, this mask value can only be used for masking the 3 least > significant lanes of bytes. But for 3 lanes of ints, the value should be > > `0000 0000 ... 0000 0000 0001 0001 0001` > > where the least significant bit of each lane matters. So AArch64 matcher > needs to know the vector element type to generate right masks. > > After this patch, the C2 generated code for copying a 50-byte array on > AArch64 SVE looks like > > mov x12, #0x32 > whilelo p0.b, xzr, x12 > add x11, x11, #0x10 > ld1b {z16.b}, p0/z, [x11] > add x10, x10, #0x10 > st1b {z16.b}, p0, [x10] > > We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on > both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested > JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array > size arguments on a 512-bit SVE-featured CPU. We got below performance > data changes. > > Benchmark (length) (Performance) > ArrayCopyAligned.testByte 10 -2.6% > ArrayCopyAligned.testByte 20 +4.7% > ArrayCopyAligned.testByte 30 +4.8% > ArrayCopyAligned.testByte 40 +21.7% > ArrayCopyAligned.testByte 50 +22.5% > ArrayCopyAligned.testByte 60 +28.4% > > The test machine has SVE vector size of 512 bits, so we see performance > gain for most array sizes less than 64 bytes. For very small arrays we > see a bit regression because a vector load/store may be a bit slower > than 1 or 2 scalar loads/stores. The x86 failure is caused by a recent commit (see [JDK-8277324](https://bugs.openjdk.java.net/browse/JDK-8277324)) and unrelated to this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From dnsimon at openjdk.java.net Thu Nov 18 08:37:42 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 18 Nov 2021 08:37:42 GMT Subject: Integrated: 8276314: [JVMCI] check alignment of call displacement during code installation In-Reply-To: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> References: <9Y-J6OfeTyxerdoYearxYLNZMzRW3NEVIWEeHFTcq9k=.38bdaf1d-ec1a-49f6-9442-5d4497d23c44@github.com> Message-ID: On Tue, 2 Nov 2021 21:31:25 GMT, Doug Simon wrote: > This PR add verification of code alignment invariants related to x64 call instructions during code installation. > This in turn allows a JVMCI compilation that generates a misaligned call to fail gracefully (i.e. bailout) instead of the VM crashing when it checks alignment before patching the displacement of a call instruction. This pull request has now been integrated. Changeset: 2f4b5405 Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/2f4b5405f0b53782f3ed5274f68b31eb968efb6d Stats: 20 lines in 3 files changed: 9 ins; 3 del; 8 mod 8276314: [JVMCI] check alignment of call displacement during code installation Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6218 From dlong at openjdk.java.net Thu Nov 18 08:59:36 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 18 Nov 2021 08:59:36 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining [v2] In-Reply-To: References: Message-ID: <7svWNFjlSF-Cp8Nm-5_BYoYBSRDS0Pl8_pNYTCkKZTI=.327ee675-e352-4a22-87fd-16e8bc1e31b5@github.com> On Wed, 17 Nov 2021 16:17:21 GMT, Christian Hagedorn wrote: >> This patch adds support to explicitly apply incremental inlining when replay compiling a method if the original compilation of the method was also incrementally inlined. We write a new value when dumping the inline tree to indicate if an inlinee was incrementally inlined (`= 1`) or not (`= 0`). >> >> To implement this, I updated the `REPLAY_VERSION` to 2 and additionally added a test to verify that old replay file versions are still working. I added some support to modify/remove version numbers of generated replay files in tests. I also refactored the test added by JDK-8275868 to reuse some of the methods. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > update comments, should_delay and parsing inline_late, fix test if run with -XX:+AlwaysIncrementalInline src/hotspot/share/opto/bytecodeInfo.cpp line 573: > 571: bool& should_delay) { > 572: assert(callee_method != NULL, "caller checks for optimized virtual!"); > 573: assert(!should_delay || AlwaysIncrementalInline, "should be initialized to false"); I'm not sure how useful this assert is now. It could be changed to should_delay == AlwaysIncrementalInline, or maybe just removed? ------------- PR: https://git.openjdk.java.net/jdk/pull/6413 From thartmann at openjdk.java.net Thu Nov 18 09:15:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 18 Nov 2021 09:15:42 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v2] In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 09:51:16 GMT, Takuya Kiriyama wrote: >> Could you please review the 8277042 code? >> This is the enhancement for 8276036. >> I add a new test to verify the value of full_count in the message of insufficient codecache. > > Takuya Kiriyama has updated the pull request incrementally with one additional commit since the last revision: > > 8277042: add test for 8276036 to compiler/codecache test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java line 34: > 32: /* > 33: * @test > 34: * @bug 8276036 I have a fix ready for JDK-8277213 (see https://github.com/openjdk/jdk/pull/6449). Could you please add the bug ID to this test? ------------- PR: https://git.openjdk.java.net/jdk/pull/6364 From thartmann at openjdk.java.net Thu Nov 18 09:16:56 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 18 Nov 2021 09:16:56 GMT Subject: RFR: 8277213: CompileTask_lock is acquired out of order with MethodCompileQueue_lock Message-ID: <8MjyvfvhihKvfx6XBRQ5lZGBNlkjvfFyyOWkM98wEKQ=.b74f4efc-1e45-4b18-8a1e-9735eadbb399@github.com> In the rare case that the compiler threads fail during initialization or the code cache is full and flushing is disabled, we completely disable JIT compilation and shutdown the compiler runtime: https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1813-L1817 In the process, we free all compiler queues and notify potentially waiting compiler threads via the `CompileTask::lock()`: https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L397-L408 The problem is that since [JDK-8273917](https://bugs.openjdk.java.net/browse/JDK-8273917) (see [commit](https://github.com/openjdk/jdk/commit/b8af6a9bfb28aaf0fea0cfdaba13236dc8cbaa3a)), the rank of `CompileTask_lock` is `Mutex::safepoint` which is equal to the rank of `MethodCompileQueue_lock` which we are already holding because we modify the compile queue. I propose to fix this by modifying the rank of the `CompileTask_lock` similar to what is done for other locks: https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/oops/methodData.cpp#L1211-L1212 The test that triggered this will be added with [PR 6364](https://git.openjdk.java.net/jdk/pull/6364). I verified that it now passes. Thanks, Tobias ------------- Commit messages: - 8277213: CompileTask_lock is acquired out of order with MethodCompileQueue_lock Changes: https://git.openjdk.java.net/jdk/pull/6449/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6449&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277213 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6449.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6449/head:pull/6449 PR: https://git.openjdk.java.net/jdk/pull/6449 From thartmann at openjdk.java.net Thu Nov 18 09:25:38 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 18 Nov 2021 09:25:38 GMT Subject: RFR: 8266368: Inaccurate after_unwind hook in C2 exception handler In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 08:42:32 GMT, Erik ?sterlund wrote: > When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C. > The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6405 From thartmann at openjdk.java.net Thu Nov 18 09:28:38 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 18 Nov 2021 09:28:38 GMT Subject: RFR: 8276976: Rename LIR_OprDesc to LIR_Opr In-Reply-To: References: Message-ID: <0j5uInqnwOy6Upr0fAkn6ym614KaasA9bMfe_IoTJKY=.7ed09c4b-e252-4e8e-9090-59c82f45b056@github.com> On Fri, 12 Nov 2021 23:24:43 GMT, Man Cao wrote: > Hi all, > > Can I have reviews for this mechanical renaming change as a follow up to https://bugs.openjdk.java.net/browse/JDK-8276453? Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6377 From aph at openjdk.java.net Thu Nov 18 09:36:37 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Nov 2021 09:36:37 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li wrote: > Arraycopy partial inlining is a C2 compiler technique that avoids stub > call overhead in small-sized arraycopy operations by generating masked > vector instructions. So far it works on x86 AVX512 only and this patch > enables it on AArch64 with SVE. > > We add AArch64 matching rule for VectorMaskGenNode and refactor that > node a little bit. The major change is moving the element type field > into its TypeVectMask bottom type. The reason is that AArch64 vector > masks are different for different vector element types. > > E.g., an x86 AVX512 vector mask value masking 3 least significant vector > lanes (of any type) is like > > `0000 0000 ... 0000 0000 0000 0000 0111` > > On AArch64 SVE, this mask value can only be used for masking the 3 least > significant lanes of bytes. But for 3 lanes of ints, the value should be > > `0000 0000 ... 0000 0000 0001 0001 0001` > > where the least significant bit of each lane matters. So AArch64 matcher > needs to know the vector element type to generate right masks. > > After this patch, the C2 generated code for copying a 50-byte array on > AArch64 SVE looks like > > mov x12, #0x32 > whilelo p0.b, xzr, x12 > add x11, x11, #0x10 > ld1b {z16.b}, p0/z, [x11] > add x10, x10, #0x10 > st1b {z16.b}, p0, [x10] > > We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on > both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested > JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array > size arguments on a 512-bit SVE-featured CPU. We got below performance > data changes. > > Benchmark (length) (Performance) > ArrayCopyAligned.testByte 10 -2.6% > ArrayCopyAligned.testByte 20 +4.7% > ArrayCopyAligned.testByte 30 +4.8% > ArrayCopyAligned.testByte 40 +21.7% > ArrayCopyAligned.testByte 50 +22.5% > ArrayCopyAligned.testByte 60 +28.4% > > The test machine has SVE vector size of 512 bits, so we see performance > gain for most array sizes less than 64 bytes. For very small arrays we > see a bit regression because a vector load/store may be a bit slower > than 1 or 2 scalar loads/stores. I'll have a look. It'll take me a little time to provision a suitable SVE-enabled AArch64 box. ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From aph at openjdk.java.net Thu Nov 18 09:48:42 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Nov 2021 09:48:42 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v9] In-Reply-To: References: <69byto1KlbJy9IA5Z6wV3nl8eQTQKyMAEg18sR6q2P8=.dbe9530e-cd25-4aa4-ad90-a66e91e95a0d@github.com> Message-ID: On Thu, 18 Nov 2021 06:08:21 GMT, Thomas Stuefe wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweako stuff. > > src/hotspot/cpu/aarch64/register_aarch64.hpp line 63: > >> 61: int encoding() const { assert(is_valid(), "invalid register"); return encoding_nocheck(); } >> 62: bool is_valid() const { return (unsigned)encoding_nocheck() < number_of_registers; } >> 63: bool has_byte_register() const { return this >= first() && this - first() < number_of_byte_registers; } > > Why not relegate to encoding_nocheck() too: `return encoding_nocheck() >= 0 && encoding_nocheck() < num_byte_regs` ? x86 changes are for later. As far as I can tell, `has_byte_register()` isn't used by anything, so I guess I'll take it out. I was trying to minimize the scope of this patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From simonis at openjdk.java.net Thu Nov 18 10:21:01 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 18 Nov 2021 10:21:01 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v10] In-Reply-To: References: Message-ID: <3DyX38fUwXmYfYuInLP-xhm1toijhtr2U7pHK2zhNqU=.b91e17bd-bea6-4323-96e0-03c59e3f0573@github.com> > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Fix jit/t/t105/t105.java to also use -XX:-OptimizeImplicitExceptions in addition to -XX:-OmitStacktracesInFastThrow - Fix IR Framework test Traps::classCheck() which now behaves differently with -XX:+OptimizeImplicitExceptions - Fix build issue for minimal/zero build one more time - Minor enhancements and fixes requested by Martin - Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. - Fix build issue for minimal/zero build - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters - Fix special case where we're creating an implicit exception for a regular invoke* bytecode - Minor updates as requested by @TheRealMDoerr - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow ------------- Changes: https://git.openjdk.java.net/jdk/pull/5488/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=09 Stats: 793 lines in 18 files changed: 778 ins; 0 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From chagedorn at openjdk.java.net Thu Nov 18 10:33:29 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 18 Nov 2021 10:33:29 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining [v3] In-Reply-To: References: Message-ID: > This patch adds support to explicitly apply incremental inlining when replay compiling a method if the original compilation of the method was also incrementally inlined. We write a new value when dumping the inline tree to indicate if an inlinee was incrementally inlined (`= 1`) or not (`= 0`). > > To implement this, I updated the `REPLAY_VERSION` to 2 and additionally added a test to verify that old replay file versions are still working. I added some support to modify/remove version numbers of generated replay files in tests. I also refactored the test added by JDK-8275868 to reuse some of the methods. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: update AlwaysIncrementalInline and assert ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6413/files - new: https://git.openjdk.java.net/jdk/pull/6413/files/0f6b2048..ab63bcba Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6413&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6413&range=01-02 Stats: 4 lines in 2 files changed: 1 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6413.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6413/head:pull/6413 PR: https://git.openjdk.java.net/jdk/pull/6413 From chagedorn at openjdk.java.net Thu Nov 18 10:33:33 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 18 Nov 2021 10:33:33 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining [v2] In-Reply-To: <7svWNFjlSF-Cp8Nm-5_BYoYBSRDS0Pl8_pNYTCkKZTI=.327ee675-e352-4a22-87fd-16e8bc1e31b5@github.com> References: <7svWNFjlSF-Cp8Nm-5_BYoYBSRDS0Pl8_pNYTCkKZTI=.327ee675-e352-4a22-87fd-16e8bc1e31b5@github.com> Message-ID: <_06RbqJ6B-30EgRX95od-xJNFTfTH_AXdqT3TuOokxk=.ccbd155b-cab2-4dc4-a506-1a76071b694a@github.com> On Thu, 18 Nov 2021 08:56:45 GMT, Dean Long wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments, should_delay and parsing inline_late, fix test if run with -XX:+AlwaysIncrementalInline > > src/hotspot/share/opto/bytecodeInfo.cpp line 573: > >> 571: bool& should_delay) { >> 572: assert(callee_method != NULL, "caller checks for optimized virtual!"); >> 573: assert(!should_delay || AlwaysIncrementalInline, "should be initialized to false"); > > I'm not sure how useful this assert is now. It could be changed to should_delay == AlwaysIncrementalInline, or maybe just removed? Agreed, removed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6413 From chagedorn at openjdk.java.net Thu Nov 18 10:33:34 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 18 Nov 2021 10:33:34 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining [v3] In-Reply-To: References: <_V6mer5gF-ZHTU1Ux9Q0waIFMu9UwvWKOmPIuPHq6js=.4e15a326-6777-4cd9-8494-a4625ac5f2e6@github.com> Message-ID: On Wed, 17 Nov 2021 20:41:10 GMT, Dean Long wrote: >> I've added a comment to make it more clear. So, this code is only to record the late inlining decision to later dump it to the replay file. I think initializing `should_delay = AlwaysIncrementalInline` is a good idea. `should_delay` can only become true but not false anymore during normal compilation. >> >> But I think we need to leave `|| AlwaysIncrementalInline` in here https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L192 in case someone wants to replay compile with that flag even though the replay file recorded a different late inlining decision. >> >> I also fixed the test if run with `-XX:+AlwaysIncrementalInline`. > > Shouldn't the recorded inlining decision always override flags like -XX:+AlwaysIncrementalInline? > This brings up the question of how to handle flags. If we stored them in the replay file, then the replay compile could compare those to the current flags and if they don't match: > 1) give a warning and continue, correct replay not guaranteed > 2) give an error and refuse to continue > 3) override current flags with saved flags (this could be implemented by having the "ci" layer cache flags settings for each compile) As we have discussed offline, it's best not to treat `AlwaysIncrementalInline` specially given that we are already enforcing the general inlining decisions based on the replay data. I will therefore remove ` || AlwaysIncrementalInline` from L192. About the flag handling in general, as you have suggested offline, I also think it's a good option to put the used flags in the replay file in the future and enforce them. If the user wants to run a different set of flags for any reasons then the replay file can be adapted accordingly. ------------- PR: https://git.openjdk.java.net/jdk/pull/6413 From thartmann at openjdk.java.net Thu Nov 18 09:58:41 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 18 Nov 2021 09:58:41 GMT Subject: RFR: 8277102: Dubious PrintCompilation output In-Reply-To: <4sIBYDkvLT3U7uu8rLoqqLGGAerrYy84Lu57Lx1uO-M=.10f22b0c-b67d-4a46-a503-5c1ffc409656@github.com> References: <4sIBYDkvLT3U7uu8rLoqqLGGAerrYy84Lu57Lx1uO-M=.10f22b0c-b67d-4a46-a503-5c1ffc409656@github.com> Message-ID: On Mon, 15 Nov 2021 11:09:35 GMT, Yi Yang wrote: > The output of PrintCompilation is ill-formed: > > > 22 1 3 java.lang.Object:: (1 bytes) > 25 2 3 java.lang.String::hashCode (60 bytes) > 25 3 3 java.lang.String::coder (15 bytes) > 27 4 3 Reduced::foo (12 bytes) > 27 5 3 java.lang.Boolean::valueOf (14 bytes) > 27 6 3 java.lang.Boolean::hashCode (8 bytes) > 27 8 4 Reduced::foo (12 bytes) > 27 7 2 java.lang.Boolean::hashCode (14 bytes) > 4 3 Reduced::foo (12 bytes) made not entrant > 29 9 % 3 Reduced::main @ 4 (33 bytes) > 29 10 3 Reduced::main (33 bytes) > 29 11 % 4 Reduced::main @ 4 (33 bytes) > 9 % 3 Reduced::main @ 4 (33 bytes) made not entrant > 11 % 4 Reduced::main @ 4 (33 bytes) made not entrant > > > This seems related to [JDK-8272586](https://bugs.openjdk.java.net/browse/JDK-8272586), which print timestamp optionally. As #5446 mentioned, printing timestamp would break DisassembleCodeBlobTest.java since it expects disassembling a given nmethod twice to produce the same result. Maybe we should fix DisassembleCodeBlobTest.java. Looks reasonable to me but a second review (@dougxc?) would be good. test/hotspot/jtreg/compiler/jvmci/compilerToVM/DisassembleCodeBlobTest.java line 112: > 110: // Compiled method (c2) 310 463 4 compiler.jvmci.compilerToVM.CompileCodeTestCase$Dummy::staticMethod (1 bytes) > 111: for (int i = 2; i < str2Lines.length; i++) { > 112: Asserts.assertEQ(str2Lines[i], str3Lines[i], Splitting the entire string by lines seems like a bit of an overhead. What about something like that (not tested)? int idx = str2.indexOf(System.lineSeparator()); idx = str2.indexOf(System.lineSeparator(), idx + 1); str2 = str2.substring(idx + 1); ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6386 From dnsimon at openjdk.java.net Thu Nov 18 10:38:28 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 18 Nov 2021 10:38:28 GMT Subject: RFR: 8277102: Dubious PrintCompilation output In-Reply-To: <4sIBYDkvLT3U7uu8rLoqqLGGAerrYy84Lu57Lx1uO-M=.10f22b0c-b67d-4a46-a503-5c1ffc409656@github.com> References: <4sIBYDkvLT3U7uu8rLoqqLGGAerrYy84Lu57Lx1uO-M=.10f22b0c-b67d-4a46-a503-5c1ffc409656@github.com> Message-ID: On Mon, 15 Nov 2021 11:09:35 GMT, Yi Yang wrote: > The output of PrintCompilation is ill-formed: > > > 22 1 3 java.lang.Object:: (1 bytes) > 25 2 3 java.lang.String::hashCode (60 bytes) > 25 3 3 java.lang.String::coder (15 bytes) > 27 4 3 Reduced::foo (12 bytes) > 27 5 3 java.lang.Boolean::valueOf (14 bytes) > 27 6 3 java.lang.Boolean::hashCode (8 bytes) > 27 8 4 Reduced::foo (12 bytes) > 27 7 2 java.lang.Boolean::hashCode (14 bytes) > 4 3 Reduced::foo (12 bytes) made not entrant > 29 9 % 3 Reduced::main @ 4 (33 bytes) > 29 10 3 Reduced::main (33 bytes) > 29 11 % 4 Reduced::main @ 4 (33 bytes) > 9 % 3 Reduced::main @ 4 (33 bytes) made not entrant > 11 % 4 Reduced::main @ 4 (33 bytes) made not entrant > > > This seems related to [JDK-8272586](https://bugs.openjdk.java.net/browse/JDK-8272586), which print timestamp optionally. As #5446 mentioned, printing timestamp would break DisassembleCodeBlobTest.java since it expects disassembling a given nmethod twice to produce the same result. Maybe we should fix DisassembleCodeBlobTest.java. Marked as reviewed by dnsimon (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6386 From dnsimon at openjdk.java.net Thu Nov 18 10:38:31 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 18 Nov 2021 10:38:31 GMT Subject: RFR: 8277102: Dubious PrintCompilation output In-Reply-To: References: <4sIBYDkvLT3U7uu8rLoqqLGGAerrYy84Lu57Lx1uO-M=.10f22b0c-b67d-4a46-a503-5c1ffc409656@github.com> Message-ID: On Thu, 18 Nov 2021 09:54:24 GMT, Tobias Hartmann wrote: >> The output of PrintCompilation is ill-formed: >> >> >> 22 1 3 java.lang.Object:: (1 bytes) >> 25 2 3 java.lang.String::hashCode (60 bytes) >> 25 3 3 java.lang.String::coder (15 bytes) >> 27 4 3 Reduced::foo (12 bytes) >> 27 5 3 java.lang.Boolean::valueOf (14 bytes) >> 27 6 3 java.lang.Boolean::hashCode (8 bytes) >> 27 8 4 Reduced::foo (12 bytes) >> 27 7 2 java.lang.Boolean::hashCode (14 bytes) >> 4 3 Reduced::foo (12 bytes) made not entrant >> 29 9 % 3 Reduced::main @ 4 (33 bytes) >> 29 10 3 Reduced::main (33 bytes) >> 29 11 % 4 Reduced::main @ 4 (33 bytes) >> 9 % 3 Reduced::main @ 4 (33 bytes) made not entrant >> 11 % 4 Reduced::main @ 4 (33 bytes) made not entrant >> >> >> This seems related to [JDK-8272586](https://bugs.openjdk.java.net/browse/JDK-8272586), which print timestamp optionally. As #5446 mentioned, printing timestamp would break DisassembleCodeBlobTest.java since it expects disassembling a given nmethod twice to produce the same result. Maybe we should fix DisassembleCodeBlobTest.java. > > test/hotspot/jtreg/compiler/jvmci/compilerToVM/DisassembleCodeBlobTest.java line 112: > >> 110: // Compiled method (c2) 310 463 4 compiler.jvmci.compilerToVM.CompileCodeTestCase$Dummy::staticMethod (1 bytes) >> 111: for (int i = 2; i < str2Lines.length; i++) { >> 112: Asserts.assertEQ(str2Lines[i], str3Lines[i], > > Splitting the entire string by lines seems like a bit of an overhead. What about something like that (not tested)? > > int idx = str2.indexOf(System.lineSeparator()); > idx = str2.indexOf(System.lineSeparator(), idx + 1); > str2 = str2.substring(idx + 1); Doing it by lines will actually provide a more focused error message if there's a problem and I cannot image the overhead matters for a test like this. ------------- PR: https://git.openjdk.java.net/jdk/pull/6386 From aph at openjdk.java.net Thu Nov 18 10:50:45 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Nov 2021 10:50:45 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v9] In-Reply-To: References: <69byto1KlbJy9IA5Z6wV3nl8eQTQKyMAEg18sR6q2P8=.dbe9530e-cd25-4aa4-ad90-a66e91e95a0d@github.com> Message-ID: On Thu, 18 Nov 2021 06:41:36 GMT, Thomas Stuefe wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweako stuff. > > src/hotspot/cpu/aarch64/register_aarch64.hpp line 156: > >> 154: FloatRegister successor() const { >> 155: return as_FloatRegister((encoding() + 1) % (unsigned)number_of_registers); >> 156: } > > Different from the other two, why? If we need validity checks here, should we not do them with the other types too? The modulo-32 behaviour in `successor()` is because of an ugly hack elsewhere. The aim of this patch is to remove the UB in class Assembler, not to fix anything else. That's just to make this patch as simple as it can be. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From thartmann at openjdk.java.net Thu Nov 18 10:51:44 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 18 Nov 2021 10:51:44 GMT Subject: RFR: 8277102: Dubious PrintCompilation output In-Reply-To: References: <4sIBYDkvLT3U7uu8rLoqqLGGAerrYy84Lu57Lx1uO-M=.10f22b0c-b67d-4a46-a503-5c1ffc409656@github.com> Message-ID: On Thu, 18 Nov 2021 10:07:29 GMT, Doug Simon wrote: >> test/hotspot/jtreg/compiler/jvmci/compilerToVM/DisassembleCodeBlobTest.java line 112: >> >>> 110: // Compiled method (c2) 310 463 4 compiler.jvmci.compilerToVM.CompileCodeTestCase$Dummy::staticMethod (1 bytes) >>> 111: for (int i = 2; i < str2Lines.length; i++) { >>> 112: Asserts.assertEQ(str2Lines[i], str3Lines[i], >> >> Splitting the entire string by lines seems like a bit of an overhead. What about something like that (not tested)? >> >> int idx = str2.indexOf(System.lineSeparator()); >> idx = str2.indexOf(System.lineSeparator(), idx + 1); >> str2 = str2.substring(idx + 1); > > Doing it by lines will actually provide a more focused error message if there's a problem and I cannot image the overhead matters for a test like this. Right, good point. ------------- PR: https://git.openjdk.java.net/jdk/pull/6386 From eosterlund at openjdk.java.net Thu Nov 18 11:20:45 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 18 Nov 2021 11:20:45 GMT Subject: RFR: 8266368: Inaccurate after_unwind hook in C2 exception handler In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:22:40 GMT, Tobias Hartmann wrote: >> When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C. >> The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C. > > Looks good. Thanks for the reviews, @TobiHartmann and @dean-long. ------------- PR: https://git.openjdk.java.net/jdk/pull/6405 From eosterlund at openjdk.java.net Thu Nov 18 11:20:46 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 18 Nov 2021 11:20:46 GMT Subject: Integrated: 8266368: Inaccurate after_unwind hook in C2 exception handler In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 08:42:32 GMT, Erik ?sterlund wrote: > When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C. > The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C. This pull request has now been integrated. Changeset: 2c06bca9 Author: Erik ?sterlund URL: https://git.openjdk.java.net/jdk/commit/2c06bca98fcf9d129d6085e26c225fb26368a558 Stats: 12 lines in 2 files changed: 5 ins; 5 del; 2 mod 8266368: Inaccurate after_unwind hook in C2 exception handler Reviewed-by: dlong, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6405 From rbackman at openjdk.java.net Thu Nov 18 12:24:40 2021 From: rbackman at openjdk.java.net (Rickard =?UTF-8?B?QsOkY2ttYW4=?=) Date: Thu, 18 Nov 2021 12:24:40 GMT Subject: RFR: 8277213: CompileTask_lock is acquired out of order with MethodCompileQueue_lock In-Reply-To: <8MjyvfvhihKvfx6XBRQ5lZGBNlkjvfFyyOWkM98wEKQ=.b74f4efc-1e45-4b18-8a1e-9735eadbb399@github.com> References: <8MjyvfvhihKvfx6XBRQ5lZGBNlkjvfFyyOWkM98wEKQ=.b74f4efc-1e45-4b18-8a1e-9735eadbb399@github.com> Message-ID: On Thu, 18 Nov 2021 09:09:30 GMT, Tobias Hartmann wrote: > In the rare case that the compiler threads fail during initialization or the code cache is full and flushing is disabled, we completely disable JIT compilation and shutdown the compiler runtime: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1813-L1817 > > In the process, we free all compiler queues and notify potentially waiting compiler threads via the `CompileTask::lock()`: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L397-L408 > > The problem is that since [JDK-8273917](https://bugs.openjdk.java.net/browse/JDK-8273917) (see [commit](https://github.com/openjdk/jdk/commit/b8af6a9bfb28aaf0fea0cfdaba13236dc8cbaa3a)), the rank of `CompileTask_lock` is `Mutex::safepoint` which is equal to the rank of `MethodCompileQueue_lock` which we are already holding because we modify the compile queue. > > I propose to fix this by modifying the rank of the `CompileTask_lock` similar to what is done for other locks: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/oops/methodData.cpp#L1211-L1212 > > The test that triggered this will be added with [PR 6364](https://git.openjdk.java.net/jdk/pull/6364). I verified that it now passes. > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by rbackman (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6449 From thartmann at openjdk.java.net Thu Nov 18 12:37:36 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 18 Nov 2021 12:37:36 GMT Subject: RFR: 8277213: CompileTask_lock is acquired out of order with MethodCompileQueue_lock In-Reply-To: <8MjyvfvhihKvfx6XBRQ5lZGBNlkjvfFyyOWkM98wEKQ=.b74f4efc-1e45-4b18-8a1e-9735eadbb399@github.com> References: <8MjyvfvhihKvfx6XBRQ5lZGBNlkjvfFyyOWkM98wEKQ=.b74f4efc-1e45-4b18-8a1e-9735eadbb399@github.com> Message-ID: On Thu, 18 Nov 2021 09:09:30 GMT, Tobias Hartmann wrote: > In the rare case that the compiler threads fail during initialization or the code cache is full and flushing is disabled, we completely disable JIT compilation and shutdown the compiler runtime: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1813-L1817 > > In the process, we free all compiler queues and notify potentially waiting compiler threads via the `CompileTask::lock()`: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L397-L408 > > The problem is that since [JDK-8273917](https://bugs.openjdk.java.net/browse/JDK-8273917) (see [commit](https://github.com/openjdk/jdk/commit/b8af6a9bfb28aaf0fea0cfdaba13236dc8cbaa3a)), the rank of `CompileTask_lock` is `Mutex::safepoint` which is equal to the rank of `MethodCompileQueue_lock` which we are already holding because we modify the compile queue. > > I propose to fix this by modifying the rank of the `CompileTask_lock` similar to what is done for other locks: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/oops/methodData.cpp#L1211-L1212 > > The test that triggered this will be added with [PR 6364](https://git.openjdk.java.net/jdk/pull/6364). I verified that it now passes. > > Thanks, > Tobias Thanks for the review, Rickard! ------------- PR: https://git.openjdk.java.net/jdk/pull/6449 From duke at openjdk.java.net Thu Nov 18 13:47:57 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Thu, 18 Nov 2021 13:47:57 GMT Subject: RFR: JDK-8277382 make c1 BlockMerger use IR::verify only when necessary Message-ID: This PR removes two calls to `IR::verify` which were unnecessary. The reason they are unnecessary is that `try_merge` does not always take any action. There is not need to verify if nothing has changed. In the cases that `try_merge` does do anything, it already calls `IR::verify` afterwards. This PR also switches some deeply nested if statements in `try_merge` to early returns. ------------- Commit messages: - _hir->verify only after modifying Changes: https://git.openjdk.java.net/jdk/pull/6456/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6456&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277382 Stats: 140 lines in 1 file changed: 24 ins; 29 del; 87 mod Patch: https://git.openjdk.java.net/jdk/pull/6456.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6456/head:pull/6456 PR: https://git.openjdk.java.net/jdk/pull/6456 From duke at openjdk.java.net Thu Nov 18 13:47:57 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Thu, 18 Nov 2021 13:47:57 GMT Subject: RFR: JDK-8277382 make c1 BlockMerger use IR::verify only when necessary In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 13:30:02 GMT, Ludvig Janiuk wrote: > This PR removes two calls to `IR::verify` which were unnecessary. The reason they are unnecessary is that `try_merge` does not always take any action. There is not need to verify if nothing has changed. In the cases that `try_merge` does do anything, it already calls `IR::verify` afterwards. > > This PR also switches some deeply nested if statements in `try_merge` to early returns. Passes both tier1 and tier2 in debug build. ------------- PR: https://git.openjdk.java.net/jdk/pull/6456 From aph at openjdk.java.net Thu Nov 18 13:56:38 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Nov 2021 13:56:38 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li wrote: > Arraycopy partial inlining is a C2 compiler technique that avoids stub > call overhead in small-sized arraycopy operations by generating masked > vector instructions. So far it works on x86 AVX512 only and this patch > enables it on AArch64 with SVE. > > We add AArch64 matching rule for VectorMaskGenNode and refactor that > node a little bit. The major change is moving the element type field > into its TypeVectMask bottom type. The reason is that AArch64 vector > masks are different for different vector element types. > > E.g., an x86 AVX512 vector mask value masking 3 least significant vector > lanes (of any type) is like > > `0000 0000 ... 0000 0000 0000 0000 0111` > > On AArch64 SVE, this mask value can only be used for masking the 3 least > significant lanes of bytes. But for 3 lanes of ints, the value should be > > `0000 0000 ... 0000 0000 0001 0001 0001` > > where the least significant bit of each lane matters. So AArch64 matcher > needs to know the vector element type to generate right masks. > > After this patch, the C2 generated code for copying a 50-byte array on > AArch64 SVE looks like > > mov x12, #0x32 > whilelo p0.b, xzr, x12 > add x11, x11, #0x10 > ld1b {z16.b}, p0/z, [x11] > add x10, x10, #0x10 > st1b {z16.b}, p0, [x10] > > We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on > both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested > JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array > size arguments on a 512-bit SVE-featured CPU. We got below performance > data changes. > > Benchmark (length) (Performance) > ArrayCopyAligned.testByte 10 -2.6% > ArrayCopyAligned.testByte 20 +4.7% > ArrayCopyAligned.testByte 30 +4.8% > ArrayCopyAligned.testByte 40 +21.7% > ArrayCopyAligned.testByte 50 +22.5% > ArrayCopyAligned.testByte 60 +28.4% > > The test machine has SVE vector size of 512 bits, so we see performance > gain for most array sizes less than 64 bytes. For very small arrays we > see a bit regression because a vector load/store may be a bit slower > than 1 or 2 scalar loads/stores. I'm having a lot of difficulty understanding how this is supposed to work. Firstly, I'm not seeing a performance increase on a fujitsu-fx700. Secondly, I'm not surprised: looking at the results of JMH `-prof:perfasm`, it seems to me that the only SVE instructions being executed are _outside_ the timing loop in the `testByte_ArrayCopyAligned_testByte_jmhTest:avgt_jmhStub` method. I'm baffled by what is going on. ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From coleenp at openjdk.java.net Thu Nov 18 14:13:42 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 18 Nov 2021 14:13:42 GMT Subject: RFR: 8277213: CompileTask_lock is acquired out of order with MethodCompileQueue_lock In-Reply-To: <8MjyvfvhihKvfx6XBRQ5lZGBNlkjvfFyyOWkM98wEKQ=.b74f4efc-1e45-4b18-8a1e-9735eadbb399@github.com> References: <8MjyvfvhihKvfx6XBRQ5lZGBNlkjvfFyyOWkM98wEKQ=.b74f4efc-1e45-4b18-8a1e-9735eadbb399@github.com> Message-ID: On Thu, 18 Nov 2021 09:09:30 GMT, Tobias Hartmann wrote: > In the rare case that the compiler threads fail during initialization or the code cache is full and flushing is disabled, we completely disable JIT compilation and shutdown the compiler runtime: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1813-L1817 > > In the process, we free all compiler queues and notify potentially waiting compiler threads via the `CompileTask::lock()`: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L397-L408 > > The problem is that since [JDK-8273917](https://bugs.openjdk.java.net/browse/JDK-8273917) (see [commit](https://github.com/openjdk/jdk/commit/b8af6a9bfb28aaf0fea0cfdaba13236dc8cbaa3a)), the rank of `CompileTask_lock` is `Mutex::safepoint` which is equal to the rank of `MethodCompileQueue_lock` which we are already holding because we modify the compile queue. > > I propose to fix this by modifying the rank of the `CompileTask_lock` similar to what is done for other locks: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/oops/methodData.cpp#L1211-L1212 > > The test that triggered this will be added with [PR 6364](https://git.openjdk.java.net/jdk/pull/6364). I verified that it now passes. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6449 From thartmann at openjdk.java.net Thu Nov 18 14:18:36 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 18 Nov 2021 14:18:36 GMT Subject: RFR: 8277213: CompileTask_lock is acquired out of order with MethodCompileQueue_lock In-Reply-To: <8MjyvfvhihKvfx6XBRQ5lZGBNlkjvfFyyOWkM98wEKQ=.b74f4efc-1e45-4b18-8a1e-9735eadbb399@github.com> References: <8MjyvfvhihKvfx6XBRQ5lZGBNlkjvfFyyOWkM98wEKQ=.b74f4efc-1e45-4b18-8a1e-9735eadbb399@github.com> Message-ID: On Thu, 18 Nov 2021 09:09:30 GMT, Tobias Hartmann wrote: > In the rare case that the compiler threads fail during initialization or the code cache is full and flushing is disabled, we completely disable JIT compilation and shutdown the compiler runtime: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1813-L1817 > > In the process, we free all compiler queues and notify potentially waiting compiler threads via the `CompileTask::lock()`: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L397-L408 > > The problem is that since [JDK-8273917](https://bugs.openjdk.java.net/browse/JDK-8273917) (see [commit](https://github.com/openjdk/jdk/commit/b8af6a9bfb28aaf0fea0cfdaba13236dc8cbaa3a)), the rank of `CompileTask_lock` is `Mutex::safepoint` which is equal to the rank of `MethodCompileQueue_lock` which we are already holding because we modify the compile queue. > > I propose to fix this by modifying the rank of the `CompileTask_lock` similar to what is done for other locks: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/oops/methodData.cpp#L1211-L1212 > > The test that triggered this will be added with [PR 6364](https://git.openjdk.java.net/jdk/pull/6364). I verified that it now passes. > > Thanks, > Tobias Thanks for the review, Coleen! ------------- PR: https://git.openjdk.java.net/jdk/pull/6449 From eosterlund at openjdk.java.net Thu Nov 18 14:18:42 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 18 Nov 2021 14:18:42 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 [v3] In-Reply-To: <4oucmwwERGx-tUr1PPfNHb7YMCHFR82eCxyEAga1CzA=.28af13c7-9a9f-489b-a598-de6fcc32828b@github.com> References: <4oucmwwERGx-tUr1PPfNHb7YMCHFR82eCxyEAga1CzA=.28af13c7-9a9f-489b-a598-de6fcc32828b@github.com> Message-ID: On Wed, 17 Nov 2021 11:19:09 GMT, Erik ?sterlund wrote: >> The C2 fast_lock and fast_unlock intrinsics don't support recursive ObjectMonitor locking. Some workloads can significantly benefit from this. Recent ObjectMonitor work has changed heuristics such that ObjectMonitors are deflated less aggressively. Therefore we can expect to see more inflated monitors in workloads where we would usually see more stack locks. That in itself is fine, except that C2 doesn't intrinsify the recursive locking paths for object monitors. Enabling those cases in the C2 code, removes a (~17%) regression we have seen with DaCapo h2 -t 1, and makes a few more benchmarks happy as well. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > use increment macro Any takers for the x86_64 code? ------------- PR: https://git.openjdk.java.net/jdk/pull/6406 From aph at openjdk.java.net Thu Nov 18 14:34:43 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Nov 2021 14:34:43 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 [v3] In-Reply-To: References: <4oucmwwERGx-tUr1PPfNHb7YMCHFR82eCxyEAga1CzA=.28af13c7-9a9f-489b-a598-de6fcc32828b@github.com> Message-ID: On Thu, 18 Nov 2021 14:15:57 GMT, Erik ?sterlund wrote: > Any takers for the x86_64 code? Sure, and as far as I know no-one took away my x86 programmer's badge yet. LGTM. ------------- PR: https://git.openjdk.java.net/jdk/pull/6406 From eosterlund at openjdk.java.net Thu Nov 18 14:43:39 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 18 Nov 2021 14:43:39 GMT Subject: RFR: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 [v3] In-Reply-To: References: <4oucmwwERGx-tUr1PPfNHb7YMCHFR82eCxyEAga1CzA=.28af13c7-9a9f-489b-a598-de6fcc32828b@github.com> Message-ID: On Thu, 18 Nov 2021 14:31:21 GMT, Andrew Haley wrote: > > Any takers for the x86_64 code? > > Sure, and as far as I know no-one took away my x86 programmer's badge yet. LGTM. Thanks Andrew. I think we can trust your x86 skills as well. :-) ------------- PR: https://git.openjdk.java.net/jdk/pull/6406 From eosterlund at openjdk.java.net Thu Nov 18 14:47:47 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 18 Nov 2021 14:47:47 GMT Subject: Integrated: 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 08:58:49 GMT, Erik ?sterlund wrote: > The C2 fast_lock and fast_unlock intrinsics don't support recursive ObjectMonitor locking. Some workloads can significantly benefit from this. Recent ObjectMonitor work has changed heuristics such that ObjectMonitors are deflated less aggressively. Therefore we can expect to see more inflated monitors in workloads where we would usually see more stack locks. That in itself is fine, except that C2 doesn't intrinsify the recursive locking paths for object monitors. Enabling those cases in the C2 code, removes a (~17%) regression we have seen with DaCapo h2 -t 1, and makes a few more benchmarks happy as well. This pull request has now been integrated. Changeset: d93b238f Author: Erik ?sterlund URL: https://git.openjdk.java.net/jdk/commit/d93b238f9725727ae1e2e9f203943b5ddf778f35 Stats: 44 lines in 2 files changed: 31 ins; 6 del; 7 mod 8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 Reviewed-by: aph, ngasson ------------- PR: https://git.openjdk.java.net/jdk/pull/6406 From shade at openjdk.java.net Thu Nov 18 15:24:56 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Nov 2021 15:24:56 GMT Subject: RFR: 8277385: Zero: Enable CompactStrings support Message-ID: <-tldYMqymQt2bG54787XcYNM1AQmaz7MpMaD7ijAges=.48f22102-66f8-44d7-80be-5674f0638bef@github.com> This enables `CompactStrings` for Zero. When we were doing original Compact Strings in JDK 9, we disabled the support on non-primary platforms, hoping relevant maintainers would follow up with platform-specific work. Here is me following up, as Zero maintainer :) There is little to do on Zero side, as it is pure interpreter without String intrinsics. There is still benefit of doing less work with smaller Strings. There are no regressions on the benchmarks I tried, and some benchmarks improve significantly. Notably, specjvm:{compiler,sunflow,derby,xmlvalidation} improve about 5%, specjvm:{serial,xmltransform} improve about 20% on x86_64. Additional testing: - [x] Linux x86_64 Zero benchmarks - [ ] Linux x86_64 Zero `tier1` ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/6459/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6459&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277385 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6459.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6459/head:pull/6459 PR: https://git.openjdk.java.net/jdk/pull/6459 From redestad at openjdk.java.net Thu Nov 18 15:45:43 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 18 Nov 2021 15:45:43 GMT Subject: RFR: 8277385: Zero: Enable CompactStrings support In-Reply-To: <-tldYMqymQt2bG54787XcYNM1AQmaz7MpMaD7ijAges=.48f22102-66f8-44d7-80be-5674f0638bef@github.com> References: <-tldYMqymQt2bG54787XcYNM1AQmaz7MpMaD7ijAges=.48f22102-66f8-44d7-80be-5674f0638bef@github.com> Message-ID: On Thu, 18 Nov 2021 15:14:55 GMT, Aleksey Shipilev wrote: > This enables `CompactStrings` for Zero. When we were doing original Compact Strings in JDK 9, we disabled the support on non-primary platforms, hoping relevant maintainers would follow up with platform-specific work. Here is me following up, as Zero maintainer :) > > There is little to do on Zero side, as it is pure interpreter without String intrinsics. Other platforms had old-shaped String intrinsics, so for them enabling the feature would mean implementing Compact-String-shaped intrinsics too. But this is irrelevant for Zero. There is still benefit of doing less work with smaller Strings. > > There are no regressions on the benchmarks I tried, and some benchmarks improve significantly. Notably, specjvm:{compiler,sunflow,derby,xmlvalidation} improve about 5%, specjvm:{serial,xmltransform} improve about 20% on x86_64. > > Additional testing: > - [x] Linux x86_64 Zero benchmarks > - [ ] Linux x86_64 Zero `tier1` Looks good and trivial. ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6459 From aph at openjdk.java.net Thu Nov 18 16:55:43 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Nov 2021 16:55:43 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 13:53:11 GMT, Andrew Haley wrote: > I'm baffled by what is going on. Sorry, it looks like I managed to confuse myself. The top of the loop looks like: 10c B17: # out( B18 ) <- in( B27 ) Freq: 4.49963 10c # castLL of R2 10c sve_whilelo P0, zr, R2 # sve 110 sve_ldr V16, P0, [R0] # load vector predicated (sve) 114 sve_str [R1], P0, V16 # store vector predicated (sve) 118 B18: # out( B30 B19 ) <- in( B17 B28 B26 ) Freq: 8.99927 118 118 ldarb R10, [R23] # byte ! Field: volatile org/openjdk/jmh/runner/InfraControlL2.isDone ... and the bottom 1a0 cmp R2, #64 1a4 bls B17 # unsigned P=0.500000 C=-1.000000 1a8 B28: # out( B18 ) <- in( B27 ) Freq: 4.49963 1a8 CALL, runtime leaf nofp 0x0000ffff6d1058f8 jbyte_arraycopy No JVM State Info # 1b0 b B18 So only if the length is < 64 (i.e. 512 bits) do we branch back to B17 to do the `SVE WHILELO` to set the predicate. This is confusing only because the code has been rearranged so that the test for < 64 bytes is at the bottom of the loop. ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From aph at openjdk.java.net Thu Nov 18 17:27:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Nov 2021 17:27:41 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li wrote: > Arraycopy partial inlining is a C2 compiler technique that avoids stub > call overhead in small-sized arraycopy operations by generating masked > vector instructions. So far it works on x86 AVX512 only and this patch > enables it on AArch64 with SVE. > > We add AArch64 matching rule for VectorMaskGenNode and refactor that > node a little bit. The major change is moving the element type field > into its TypeVectMask bottom type. The reason is that AArch64 vector > masks are different for different vector element types. > > E.g., an x86 AVX512 vector mask value masking 3 least significant vector > lanes (of any type) is like > > `0000 0000 ... 0000 0000 0000 0000 0111` > > On AArch64 SVE, this mask value can only be used for masking the 3 least > significant lanes of bytes. But for 3 lanes of ints, the value should be > > `0000 0000 ... 0000 0000 0001 0001 0001` > > where the least significant bit of each lane matters. So AArch64 matcher > needs to know the vector element type to generate right masks. > > After this patch, the C2 generated code for copying a 50-byte array on > AArch64 SVE looks like > > mov x12, #0x32 > whilelo p0.b, xzr, x12 > add x11, x11, #0x10 > ld1b {z16.b}, p0/z, [x11] > add x10, x10, #0x10 > st1b {z16.b}, p0, [x10] > > We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on > both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested > JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array > size arguments on a 512-bit SVE-featured CPU. We got below performance > data changes. > > Benchmark (length) (Performance) > ArrayCopyAligned.testByte 10 -2.6% > ArrayCopyAligned.testByte 20 +4.7% > ArrayCopyAligned.testByte 30 +4.8% > ArrayCopyAligned.testByte 40 +21.7% > ArrayCopyAligned.testByte 50 +22.5% > ArrayCopyAligned.testByte 60 +28.4% > > The test machine has SVE vector size of 512 bits, so we see performance > gain for most array sizes less than 64 bytes. For very small arrays we > see a bit regression because a vector load/store may be a bit slower > than 1 or 2 scalar loads/stores. Hurrah! I have managed to duplicate your results. Old: Benchmark (length) Mode Cnt Score Error Units ArrayCopyAligned.testByte 40 avgt 5 23.332 ? 0.016 ns/op New: ArrayCopyAligned.testByte 40 avgt 5 18.092 ? 0.093 ns/op ... and in fact your result is much better than this suggests, because the bulk of the test is fetching all of the arguments to arraycopy, not actually copying the bytes. I get it now. ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From sviswanathan at openjdk.java.net Thu Nov 18 18:29:41 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 18 Nov 2021 18:29:41 GMT Subject: RFR: 8277239: SIGSEGV in vrshift_reg_maskedNode::emit In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 13:44:51 GMT, Jatin Bhateja wrote: > Currently instruction selector differentiates between the two kinds of vector shift operations i.e. one with vector shift count and other with scalar shift count passed though LShiftCntV/RShiftCntV nodes by looking at the ideal opcode of shift count node. > > A more robust scheme is to set a flag over vector shift node if it has variable vector shift count and replace the opcode based check with flag based check in various shift instruction selection patterns. The patch looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6431 From psandoz at openjdk.java.net Thu Nov 18 19:37:44 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 18 Nov 2021 19:37:44 GMT Subject: RFR: 8277239: SIGSEGV in vrshift_reg_maskedNode::emit In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 13:44:51 GMT, Jatin Bhateja wrote: > Currently instruction selector differentiates between the two kinds of vector shift operations i.e. one with vector shift count and other with scalar shift count passed though LShiftCntV/RShiftCntV nodes by looking at the ideal opcode of shift count node. > > A more robust scheme is to set a flag over vector shift node if it has variable vector shift count and replace the opcode based check with flag based check in various shift instruction selection patterns. Tests passed. Needs another HotSpot reviewer. ------------- PR: https://git.openjdk.java.net/jdk/pull/6431 From dlong at openjdk.java.net Thu Nov 18 20:23:42 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 18 Nov 2021 20:23:42 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 10:33:29 GMT, Christian Hagedorn wrote: >> This patch adds support to explicitly apply incremental inlining when replay compiling a method if the original compilation of the method was also incrementally inlined. We write a new value when dumping the inline tree to indicate if an inlinee was incrementally inlined (`= 1`) or not (`= 0`). >> >> To implement this, I updated the `REPLAY_VERSION` to 2 and additionally added a test to verify that old replay file versions are still working. I added some support to modify/remove version numbers of generated replay files in tests. I also refactored the test added by JDK-8275868 to reuse some of the methods. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > update AlwaysIncrementalInline and assert Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6413 From iveresov at openjdk.java.net Thu Nov 18 22:52:39 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Thu, 18 Nov 2021 22:52:39 GMT Subject: RFR: 8276976: Rename LIR_OprDesc to LIR_Opr In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 23:24:43 GMT, Man Cao wrote: > Hi all, > > Can I have reviews for this mechanical renaming change as a follow up to https://bugs.openjdk.java.net/browse/JDK-8276453? Looks good. ------------- Marked as reviewed by iveresov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6377 From dlong at openjdk.java.net Thu Nov 18 23:30:08 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 18 Nov 2021 23:30:08 GMT Subject: RFR: 8277423: ciReplay: hidden class with comment expected error Message-ID: Refactor code to dump hidden classes consistently. ------------- Commit messages: - refactor to dump hidden classes consistently Changes: https://git.openjdk.java.net/jdk/pull/6467/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6467&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277423 Stats: 29 lines in 3 files changed: 18 ins; 9 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6467.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6467/head:pull/6467 PR: https://git.openjdk.java.net/jdk/pull/6467 From manc at openjdk.java.net Thu Nov 18 23:41:42 2021 From: manc at openjdk.java.net (Man Cao) Date: Thu, 18 Nov 2021 23:41:42 GMT Subject: RFR: 8276976: Rename LIR_OprDesc to LIR_Opr In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 23:24:43 GMT, Man Cao wrote: > Hi all, > > Can I have reviews for this mechanical renaming change as a follow up to https://bugs.openjdk.java.net/browse/JDK-8276453? Thank you for the reviews ------------- PR: https://git.openjdk.java.net/jdk/pull/6377 From manc at openjdk.java.net Thu Nov 18 23:41:43 2021 From: manc at openjdk.java.net (Man Cao) Date: Thu, 18 Nov 2021 23:41:43 GMT Subject: Integrated: 8276976: Rename LIR_OprDesc to LIR_Opr In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 23:24:43 GMT, Man Cao wrote: > Hi all, > > Can I have reviews for this mechanical renaming change as a follow up to https://bugs.openjdk.java.net/browse/JDK-8276453? This pull request has now been integrated. Changeset: 839033ba Author: Man Cao URL: https://git.openjdk.java.net/jdk/commit/839033baf61ca7f10437e8e53b2114b081d97ea9 Stats: 238 lines in 14 files changed: 0 ins; 9 del; 229 mod 8276976: Rename LIR_OprDesc to LIR_Opr Co-authored-by: Chuck Rasbold Reviewed-by: thartmann, iveresov ------------- PR: https://git.openjdk.java.net/jdk/pull/6377 From duke at openjdk.java.net Fri Nov 19 01:19:56 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Fri, 19 Nov 2021 01:19:56 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 Message-ID: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Hi, This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). Passed all tests except the one being fixed by [JDK-8277324](https://bugs.openjdk.java.net/browse/JDK-8277324). Thank you very much. ------------- Commit messages: - reduce some dependencies with spare register - improve mask reduction logic on AVX Changes: https://git.openjdk.java.net/jdk/pull/6447/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6447&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277426 Stats: 214 lines in 5 files changed: 139 ins; 17 del; 58 mod Patch: https://git.openjdk.java.net/jdk/pull/6447.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6447/head:pull/6447 PR: https://git.openjdk.java.net/jdk/pull/6447 From yyang at openjdk.java.net Fri Nov 19 02:07:45 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Fri, 19 Nov 2021 02:07:45 GMT Subject: RFR: 8277102: Dubious PrintCompilation output In-Reply-To: References: <4sIBYDkvLT3U7uu8rLoqqLGGAerrYy84Lu57Lx1uO-M=.10f22b0c-b67d-4a46-a503-5c1ffc409656@github.com> Message-ID: On Thu, 18 Nov 2021 09:55:13 GMT, Tobias Hartmann wrote: >> The output of PrintCompilation is ill-formed: >> >> >> 22 1 3 java.lang.Object:: (1 bytes) >> 25 2 3 java.lang.String::hashCode (60 bytes) >> 25 3 3 java.lang.String::coder (15 bytes) >> 27 4 3 Reduced::foo (12 bytes) >> 27 5 3 java.lang.Boolean::valueOf (14 bytes) >> 27 6 3 java.lang.Boolean::hashCode (8 bytes) >> 27 8 4 Reduced::foo (12 bytes) >> 27 7 2 java.lang.Boolean::hashCode (14 bytes) >> 4 3 Reduced::foo (12 bytes) made not entrant >> 29 9 % 3 Reduced::main @ 4 (33 bytes) >> 29 10 3 Reduced::main (33 bytes) >> 29 11 % 4 Reduced::main @ 4 (33 bytes) >> 9 % 3 Reduced::main @ 4 (33 bytes) made not entrant >> 11 % 4 Reduced::main @ 4 (33 bytes) made not entrant >> >> >> This seems related to [JDK-8272586](https://bugs.openjdk.java.net/browse/JDK-8272586), which print timestamp optionally. As #5446 mentioned, printing timestamp would break DisassembleCodeBlobTest.java since it expects disassembling a given nmethod twice to produce the same result. Maybe we should fix DisassembleCodeBlobTest.java. > > Looks reasonable to me but a second review (@dougxc?) would be good. Thanks @TobiHartmann and @dougxc for reviews? ------------- PR: https://git.openjdk.java.net/jdk/pull/6386 From yyang at openjdk.java.net Fri Nov 19 02:07:45 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Fri, 19 Nov 2021 02:07:45 GMT Subject: Integrated: 8277102: Dubious PrintCompilation output In-Reply-To: <4sIBYDkvLT3U7uu8rLoqqLGGAerrYy84Lu57Lx1uO-M=.10f22b0c-b67d-4a46-a503-5c1ffc409656@github.com> References: <4sIBYDkvLT3U7uu8rLoqqLGGAerrYy84Lu57Lx1uO-M=.10f22b0c-b67d-4a46-a503-5c1ffc409656@github.com> Message-ID: On Mon, 15 Nov 2021 11:09:35 GMT, Yi Yang wrote: > The output of PrintCompilation is ill-formed: > > > 22 1 3 java.lang.Object:: (1 bytes) > 25 2 3 java.lang.String::hashCode (60 bytes) > 25 3 3 java.lang.String::coder (15 bytes) > 27 4 3 Reduced::foo (12 bytes) > 27 5 3 java.lang.Boolean::valueOf (14 bytes) > 27 6 3 java.lang.Boolean::hashCode (8 bytes) > 27 8 4 Reduced::foo (12 bytes) > 27 7 2 java.lang.Boolean::hashCode (14 bytes) > 4 3 Reduced::foo (12 bytes) made not entrant > 29 9 % 3 Reduced::main @ 4 (33 bytes) > 29 10 3 Reduced::main (33 bytes) > 29 11 % 4 Reduced::main @ 4 (33 bytes) > 9 % 3 Reduced::main @ 4 (33 bytes) made not entrant > 11 % 4 Reduced::main @ 4 (33 bytes) made not entrant > > > This seems related to [JDK-8272586](https://bugs.openjdk.java.net/browse/JDK-8272586), which print timestamp optionally. As #5446 mentioned, printing timestamp would break DisassembleCodeBlobTest.java since it expects disassembling a given nmethod twice to produce the same result. Maybe we should fix DisassembleCodeBlobTest.java. This pull request has now been integrated. Changeset: 2f0bde1a Author: Yi Yang URL: https://git.openjdk.java.net/jdk/commit/2f0bde1a658b0910304c110920a2e8ccbe4557f8 Stats: 19 lines in 4 files changed: 9 ins; 2 del; 8 mod 8277102: Dubious PrintCompilation output Reviewed-by: thartmann, dnsimon ------------- PR: https://git.openjdk.java.net/jdk/pull/6386 From thartmann at openjdk.java.net Fri Nov 19 07:16:43 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 19 Nov 2021 07:16:43 GMT Subject: Integrated: 8277213: CompileTask_lock is acquired out of order with MethodCompileQueue_lock In-Reply-To: <8MjyvfvhihKvfx6XBRQ5lZGBNlkjvfFyyOWkM98wEKQ=.b74f4efc-1e45-4b18-8a1e-9735eadbb399@github.com> References: <8MjyvfvhihKvfx6XBRQ5lZGBNlkjvfFyyOWkM98wEKQ=.b74f4efc-1e45-4b18-8a1e-9735eadbb399@github.com> Message-ID: On Thu, 18 Nov 2021 09:09:30 GMT, Tobias Hartmann wrote: > In the rare case that the compiler threads fail during initialization or the code cache is full and flushing is disabled, we completely disable JIT compilation and shutdown the compiler runtime: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1813-L1817 > > In the process, we free all compiler queues and notify potentially waiting compiler threads via the `CompileTask::lock()`: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L397-L408 > > The problem is that since [JDK-8273917](https://bugs.openjdk.java.net/browse/JDK-8273917) (see [commit](https://github.com/openjdk/jdk/commit/b8af6a9bfb28aaf0fea0cfdaba13236dc8cbaa3a)), the rank of `CompileTask_lock` is `Mutex::safepoint` which is equal to the rank of `MethodCompileQueue_lock` which we are already holding because we modify the compile queue. > > I propose to fix this by modifying the rank of the `CompileTask_lock` similar to what is done for other locks: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/oops/methodData.cpp#L1211-L1212 > > The test that triggered this will be added with [PR 6364](https://git.openjdk.java.net/jdk/pull/6364). I verified that it now passes. > > Thanks, > Tobias This pull request has now been integrated. Changeset: f34f1190 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/f34f119080b4e8baf396fb26c21d572dd432fd91 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8277213: CompileTask_lock is acquired out of order with MethodCompileQueue_lock Reviewed-by: rbackman, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/6449 From thartmann at openjdk.java.net Fri Nov 19 07:24:40 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 19 Nov 2021 07:24:40 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v2] In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 09:51:16 GMT, Takuya Kiriyama wrote: >> Could you please review the 8277042 code? >> This is the enhancement for 8276036. >> I add a new test to verify the value of full_count in the message of insufficient codecache. > > Takuya Kiriyama has updated the pull request incrementally with one additional commit since the last revision: > > 8277042: add test for 8276036 to compiler/codecache Now even with the fix for https://github.com/openjdk/jdk/pull/6449, I hit an assert with the new test once that I'm unfortunately not able to reproduce even after hundreds of runs: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/compiler/compileBroker.cpp:369), pid=25796, tid=6403 # assert(_last->next() == __null) failed: not last Stack: [0x000070000838a000,0x000070000848a000], sp=0x00007000084862f0, free space=1008k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x1159a39] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x6e9 V [libjvm.dylib+0x115a0bb] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x3b V [libjvm.dylib+0x628fbd] report_vm_error(char const*, int, char const*, char const*, ...)+0xdd V [libjvm.dylib+0x5e394a] CompileQueue::add(CompileTask*)+0x8a V [libjvm.dylib+0x5e718a] CompileBroker::compile_method_base(methodHandle const&, int, int, methodHandle const&, int, CompileTask::CompileReason, bool, Thread*)+0xa3a V [libjvm.dylib+0x5e7f5d] CompileBroker::compile_method(methodHandle const&, int, int, methodHandle const&, int, CompileTask::CompileReason, DirectiveSet*, JavaThread*)+0x6ad V [libjvm.dylib+0x5e788e] CompileBroker::compile_method(methodHandle const&, int, int, methodHandle const&, int, CompileTask::CompileReason, JavaThread*)+0xbe V [libjvm.dylib+0x5c744e] CompilationPolicy::compile(methodHandle const&, int, CompLevel, JavaThread*)+0x4ee V [libjvm.dylib+0x5c69dc] CompilationPolicy::event(methodHandle const&, methodHandle const&, int, int, CompLevel, CompiledMethod*, JavaThread*)+0x20c V [libjvm.dylib+0x8eea04] InterpreterRuntime::frequency_counter_overflow_inner(JavaThread*, unsigned char*)+0x304 V [libjvm.dylib+0x8ee4ca] InterpreterRuntime::frequency_counter_overflow(JavaThread*, unsigned char*)+0x1a j Foo3.foo()I+0 j SomeClass.()V+61 v ~StubRoutines::call_stub ``` It seems to be a concurrency issue. I filed [JDK-8277441](https://bugs.openjdk.java.net/browse/JDK-8277441) to track this and will set up a server machine to run it for several days. I'll report back but until then, we should still hold of with integrating this or integrate and problem list right away. @tkiriyama Good work with the test which already caught two bugs! ------------- PR: https://git.openjdk.java.net/jdk/pull/6364 From jbhateja at openjdk.java.net Fri Nov 19 08:10:42 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 19 Nov 2021 08:10:42 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: <82Kgtn4RllwF2ifvmwtaQaeG9ADXeUoq290BKnd8PZ4=.ed410c36-2f5c-4b29-9d96-07d33ac872ee@github.com> On Thu, 18 Nov 2021 06:55:34 GMT, Pengfei Li wrote: >> Arraycopy partial inlining is a C2 compiler technique that avoids stub >> call overhead in small-sized arraycopy operations by generating masked >> vector instructions. So far it works on x86 AVX512 only and this patch >> enables it on AArch64 with SVE. >> >> We add AArch64 matching rule for VectorMaskGenNode and refactor that >> node a little bit. The major change is moving the element type field >> into its TypeVectMask bottom type. The reason is that AArch64 vector >> masks are different for different vector element types. >> >> E.g., an x86 AVX512 vector mask value masking 3 least significant vector >> lanes (of any type) is like >> >> `0000 0000 ... 0000 0000 0000 0000 0111` >> >> On AArch64 SVE, this mask value can only be used for masking the 3 least >> significant lanes of bytes. But for 3 lanes of ints, the value should be >> >> `0000 0000 ... 0000 0000 0001 0001 0001` >> >> where the least significant bit of each lane matters. So AArch64 matcher >> needs to know the vector element type to generate right masks. >> >> After this patch, the C2 generated code for copying a 50-byte array on >> AArch64 SVE looks like >> >> mov x12, #0x32 >> whilelo p0.b, xzr, x12 >> add x11, x11, #0x10 >> ld1b {z16.b}, p0/z, [x11] >> add x10, x10, #0x10 >> st1b {z16.b}, p0, [x10] >> >> We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on >> both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested >> JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array >> size arguments on a 512-bit SVE-featured CPU. We got below performance >> data changes. >> >> Benchmark (length) (Performance) >> ArrayCopyAligned.testByte 10 -2.6% >> ArrayCopyAligned.testByte 20 +4.7% >> ArrayCopyAligned.testByte 30 +4.8% >> ArrayCopyAligned.testByte 40 +21.7% >> ArrayCopyAligned.testByte 50 +22.5% >> ArrayCopyAligned.testByte 60 +28.4% >> >> The test machine has SVE vector size of 512 bits, so we see performance >> gain for most array sizes less than 64 bytes. For very small arrays we >> see a bit regression because a vector load/store may be a bit slower >> than 1 or 2 scalar loads/stores. > > The x86 failure is caused by a recent commit (see [JDK-8277324](https://bugs.openjdk.java.net/browse/JDK-8277324)) and unrelated to this PR. Hi @pfustc , common type system changes looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From roland at openjdk.java.net Fri Nov 19 08:16:49 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 19 Nov 2021 08:16:49 GMT Subject: RFR: 8277324: C2 compilation fails with "bad AD file" on x86-32 after JDK-8276162 due to missing match rule [v2] In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 12:41:08 GMT, Tobias Hartmann wrote: >> [JDK-8276162](https://bugs.openjdk.java.net/browse/JDK-8276162) introduced an optimization that creates `CMoveI (Bool (CmpUL ...) ...)` shapes but x86-32 misses the corresponding match rules in C2's backend. >> >> I also fixed two comments incorrectly referring to ints instead of ptrs. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Use expand Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6427 From chagedorn at openjdk.java.net Fri Nov 19 08:18:41 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 19 Nov 2021 08:18:41 GMT Subject: RFR: 8277423: ciReplay: hidden class with comment expected error In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 23:22:19 GMT, Dean Long wrote: > Refactor code to dump hidden classes consistently. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6467 From chagedorn at openjdk.java.net Fri Nov 19 08:23:41 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 19 Nov 2021 08:23:41 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 10:33:29 GMT, Christian Hagedorn wrote: >> This patch adds support to explicitly apply incremental inlining when replay compiling a method if the original compilation of the method was also incrementally inlined. We write a new value when dumping the inline tree to indicate if an inlinee was incrementally inlined (`= 1`) or not (`= 0`). >> >> To implement this, I updated the `REPLAY_VERSION` to 2 and additionally added a test to verify that old replay file versions are still working. I added some support to modify/remove version numbers of generated replay files in tests. I also refactored the test added by JDK-8275868 to reuse some of the methods. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > update AlwaysIncrementalInline and assert Thanks Dean for the careful review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6413 From jbhateja at openjdk.java.net Fri Nov 19 08:24:42 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 19 Nov 2021 08:24:42 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li wrote: > Arraycopy partial inlining is a C2 compiler technique that avoids stub > call overhead in small-sized arraycopy operations by generating masked > vector instructions. So far it works on x86 AVX512 only and this patch > enables it on AArch64 with SVE. > > We add AArch64 matching rule for VectorMaskGenNode and refactor that > node a little bit. The major change is moving the element type field > into its TypeVectMask bottom type. The reason is that AArch64 vector > masks are different for different vector element types. > > E.g., an x86 AVX512 vector mask value masking 3 least significant vector > lanes (of any type) is like > > `0000 0000 ... 0000 0000 0000 0000 0111` > > On AArch64 SVE, this mask value can only be used for masking the 3 least > significant lanes of bytes. But for 3 lanes of ints, the value should be > > `0000 0000 ... 0000 0000 0001 0001 0001` > > where the least significant bit of each lane matters. So AArch64 matcher > needs to know the vector element type to generate right masks. > > After this patch, the C2 generated code for copying a 50-byte array on > AArch64 SVE looks like > > mov x12, #0x32 > whilelo p0.b, xzr, x12 > add x11, x11, #0x10 > ld1b {z16.b}, p0/z, [x11] > add x10, x10, #0x10 > st1b {z16.b}, p0, [x10] > > We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on > both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested > JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array > size arguments on a 512-bit SVE-featured CPU. We got below performance > data changes. > > Benchmark (length) (Performance) > ArrayCopyAligned.testByte 10 -2.6% > ArrayCopyAligned.testByte 20 +4.7% > ArrayCopyAligned.testByte 30 +4.8% > ArrayCopyAligned.testByte 40 +21.7% > ArrayCopyAligned.testByte 50 +22.5% > ArrayCopyAligned.testByte 60 +28.4% > > The test machine has SVE vector size of 512 bits, so we see performance > gain for most array sizes less than 64 bytes. For very small arrays we > see a bit regression because a vector load/store may be a bit slower > than 1 or 2 scalar loads/stores. Common type system changes looks good to me. ------------- Marked as reviewed by jbhateja (Committer). PR: https://git.openjdk.java.net/jdk/pull/6444 From thartmann at openjdk.java.net Fri Nov 19 08:28:46 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 19 Nov 2021 08:28:46 GMT Subject: RFR: 8277324: C2 compilation fails with "bad AD file" on x86-32 after JDK-8276162 due to missing match rule [v2] In-Reply-To: References: Message-ID: <8Jx4A6mm3C-l5Zknh496UVpUT0XwhgOLx6F_h9k3cXg=.70188a27-34cc-4696-a113-ec1ac28fc9ed@github.com> On Wed, 17 Nov 2021 12:41:08 GMT, Tobias Hartmann wrote: >> [JDK-8276162](https://bugs.openjdk.java.net/browse/JDK-8276162) introduced an optimization that creates `CMoveI (Bool (CmpUL ...) ...)` shapes but x86-32 misses the corresponding match rules in C2's backend. >> >> I also fixed two comments incorrectly referring to ints instead of ptrs. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Use expand Thanks for the reviews, Christian and Roland! ------------- PR: https://git.openjdk.java.net/jdk/pull/6427 From thartmann at openjdk.java.net Fri Nov 19 08:28:47 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 19 Nov 2021 08:28:47 GMT Subject: Integrated: 8277324: C2 compilation fails with "bad AD file" on x86-32 after JDK-8276162 due to missing match rule In-Reply-To: References: Message-ID: <8OkdIsfI8BLoJ4Q1OfxpyF1DReocXY6rDNYGaE2ERfI=.785d6a3e-5bcc-427b-bc80-2acdf0740831@github.com> On Wed, 17 Nov 2021 09:25:26 GMT, Tobias Hartmann wrote: > [JDK-8276162](https://bugs.openjdk.java.net/browse/JDK-8276162) introduced an optimization that creates `CMoveI (Bool (CmpUL ...) ...)` shapes but x86-32 misses the corresponding match rules in C2's backend. > > I also fixed two comments incorrectly referring to ints instead of ptrs. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 3a76d397 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/3a76d397949ad22e4786476e583cc9d33c015214 Stats: 66 lines in 1 file changed: 54 ins; 4 del; 8 mod 8277324: C2 compilation fails with "bad AD file" on x86-32 after JDK-8276162 due to missing match rule Reviewed-by: chagedorn, roland ------------- PR: https://git.openjdk.java.net/jdk/pull/6427 From jiefu at openjdk.java.net Fri Nov 19 09:44:59 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 19 Nov 2021 09:44:59 GMT Subject: RFR: 8277449: compiler/vectorapi/TestLongVectorNeg.java fails with release VMs Message-ID: Hi all, Please review this trivial fix. Thanks. Best regards, Jie ------------- Commit messages: - 8277449: compiler/vectorapi/TestLongVectorNeg.java fails with release VMs Changes: https://git.openjdk.java.net/jdk/pull/6476/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6476&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277449 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6476.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6476/head:pull/6476 PR: https://git.openjdk.java.net/jdk/pull/6476 From thartmann at openjdk.java.net Fri Nov 19 09:48:43 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 19 Nov 2021 09:48:43 GMT Subject: RFR: 8277449: compiler/vectorapi/TestLongVectorNeg.java fails with release VMs In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 09:37:05 GMT, Jie Fu wrote: > Hi all, > > Please review this trivial fix. > > Thanks. > Best regards, > Jie Looks good and trivial. Thanks for fixing. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6476 From chagedorn at openjdk.java.net Fri Nov 19 09:55:25 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 19 Nov 2021 09:55:25 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v3] In-Reply-To: References: Message-ID: > In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 > ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) > > In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 > > During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 > > But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. > > I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. > > I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. > > I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - New fix - Merge branch 'master' into JDK-8275326 - handle GVN - C2: assert(no_dead_loop) failed: dead loop detected ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6276/files - new: https://git.openjdk.java.net/jdk/pull/6276/files/4de74cf7..a1675453 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6276&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6276&range=01-02 Stats: 433576 lines in 942 files changed: 220923 ins; 199771 del; 12882 mod Patch: https://git.openjdk.java.net/jdk/pull/6276.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6276/head:pull/6276 PR: https://git.openjdk.java.net/jdk/pull/6276 From chagedorn at openjdk.java.net Fri Nov 19 09:55:27 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 19 Nov 2021 09:55:27 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: References: Message-ID: On Mon, 8 Nov 2021 09:18:14 GMT, Christian Hagedorn wrote: >> In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 >> ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) >> >> In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 >> >> During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 >> >> But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. >> >> I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. >> >> I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. >> >> I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > handle GVN I had another go at this. I don't think we should somehow change the IGVN worklist to force the new phi nodes to be processed immediately. We could be adding new nodes during their transformations which could change our modifications again. I had another idea instead: Eagerly replacing the `this` phi node with the new `MergeMem` node. We would replace it anyways as a next step in IGVN when returning from `Phi::Ideal()`. This cuts the `this` phi entirely off from the graph and does not interfere in the dead loop checks during the transformations of the new phis: https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 I think that fix is only needed during IGVN (not at parsing) when control and data can be dying. However, I have not seen that we do something similar at any other place. It appears to be correct to me and testing looks good but I'm still wondering if this is really feasible to do? ------------- PR: https://git.openjdk.java.net/jdk/pull/6276 From chagedorn at openjdk.java.net Fri Nov 19 09:56:46 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 19 Nov 2021 09:56:46 GMT Subject: RFR: 8277449: compiler/vectorapi/TestLongVectorNeg.java fails with release VMs In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 09:37:05 GMT, Jie Fu wrote: > Hi all, > > Please review this trivial fix. > > Thanks. > Best regards, > Jie Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6476 From chagedorn at openjdk.java.net Fri Nov 19 09:58:43 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 19 Nov 2021 09:58:43 GMT Subject: RFR: 8268744: Improve sinking algorithm in partial peeling to avoid redundant clones In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 15:46:13 GMT, Christian Hagedorn wrote: > The algorithm in step 2 in partial peeling to move data nodes from the peel section to the non-peel section uses a straight forward cloning algorithm which creates redundant clones when the IR contains one ore more diamonds of data nodes to be cloned. The number of clones grows exponentially which could lead to a bailout (added by [JDK-8256934](https://bugs.openjdk.java.net/browse/JDK-8256934) for JDK 17 with a testcase). This RFE improves this algorithm to handle node diamonds more efficiently to avoid unnecessary cloning. The testcase for JDK-8256934 does not bail out anymore and uses only few clones. > > The main idea is to first find all outside of the loop uses `u` of the nodes in the initial peel region to be moved into the non-peel region. We then only need to clone any data node during the algorithm at most `u` times, once for each initial outside of the loop use. If we process a node diamond (following inputs), we can use an already cloned node for the top node of the diamond (node A in the example below). An example with 1 initial outside of the loop use and 4 nodes to be cloned, forming a diamond, is shown as comment in the code: > https://github.com/openjdk/jdk/blob/8ae0e1a06558a1678521dcb4ed32708a1821b47d/src/hotspot/share/opto/loopopts.cpp#L3605-L3635 > > The algorithm is explained in more details in the comments in the code (starting in method `move_nodes_to_not_peel()`). > > I also cleaned up the code for step 2 of partial peeling. I left the bailout code added by JDK-8256934 in place which I think is still required if we enter partial peeling with a huge number of live nodes (quite rare though). > > I additionally ran some standard benchmarks which did not show any improvements but also no regressions. I think it is rather an edge case where the old algorithm creates a huge number of redundant clones. Nevertheless, I think this improved algorithm is still worth to have to handle the more uncommon case of node diamonds. What do you think? > > Thanks, > Christian I haven't had time to pick this up again, yet. I will defer it to JDK 19 for now. ------------- PR: https://git.openjdk.java.net/jdk/pull/4923 From jiefu at openjdk.java.net Fri Nov 19 10:42:47 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 19 Nov 2021 10:42:47 GMT Subject: RFR: 8277449: compiler/vectorapi/TestLongVectorNeg.java fails with release VMs In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 09:46:06 GMT, Tobias Hartmann wrote: >> Hi all, >> >> Please review this trivial fix. >> >> Thanks. >> Best regards, >> Jie > > Looks good and trivial. Thanks for fixing. Thanks @TobiHartmann and @chhagedorn . ------------- PR: https://git.openjdk.java.net/jdk/pull/6476 From jiefu at openjdk.java.net Fri Nov 19 10:42:48 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 19 Nov 2021 10:42:48 GMT Subject: Integrated: 8277449: compiler/vectorapi/TestLongVectorNeg.java fails with release VMs In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 09:37:05 GMT, Jie Fu wrote: > Hi all, > > Please review this trivial fix. > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: b15e6f07 Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/b15e6f076afe5ac68e9af68756860d0b25adea4b Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8277449: compiler/vectorapi/TestLongVectorNeg.java fails with release VMs Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/6476 From adinn at openjdk.java.net Fri Nov 19 10:57:44 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Fri, 19 Nov 2021 10:57:44 GMT Subject: RFR: 8277385: Zero: Enable CompactStrings support In-Reply-To: <-tldYMqymQt2bG54787XcYNM1AQmaz7MpMaD7ijAges=.48f22102-66f8-44d7-80be-5674f0638bef@github.com> References: <-tldYMqymQt2bG54787XcYNM1AQmaz7MpMaD7ijAges=.48f22102-66f8-44d7-80be-5674f0638bef@github.com> Message-ID: <4hJpX3ZaxxafKrgUcMjwkDUJWLpR1TrkYoub9scqr6s=.c8c38531-f576-4670-b041-568d12c06a04@github.com> On Thu, 18 Nov 2021 15:14:55 GMT, Aleksey Shipilev wrote: > This enables `CompactStrings` for Zero. When we were doing original Compact Strings in JDK 9, we disabled the support on non-primary platforms, hoping relevant maintainers would follow up with platform-specific work. Here is me following up, as Zero maintainer :) > > There is little to do on Zero side, as it is pure interpreter without String intrinsics. Other platforms had old-shaped String intrinsics, so for them enabling the feature would mean implementing Compact-String-shaped intrinsics too. But this is irrelevant for Zero. There is still benefit of doing less work with smaller Strings. > > There are no regressions on the benchmarks I tried, and some benchmarks improve significantly. Notably, specjvm:{compiler,sunflow,derby,xmlvalidation} improve about 5%, specjvm:{serial,xmltransform} improve about 20% on x86_64. > > Additional testing: > - [x] Linux x86_64 Zero benchmarks > - [x] Linux x86_64 Zero `tier1` That's a nice result for very little work :-) ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6459 From duke at openjdk.java.net Fri Nov 19 11:43:16 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Fri, 19 Nov 2021 11:43:16 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: > Hi, > > This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. > > The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). > > Passed all tests except the one being fixed by [JDK-8277324](https://bugs.openjdk.java.net/browse/JDK-8277324). > > Thank you very much. Mai ??ng Qu?n Anh has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into vectorMaskReduction - reduce some dependencies with spare register - improve mask reduction logic on AVX ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6447/files - new: https://git.openjdk.java.net/jdk/pull/6447/files/d29b7ba0..1dae02d4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6447&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6447&range=00-01 Stats: 3137 lines in 166 files changed: 2359 ins; 150 del; 628 mod Patch: https://git.openjdk.java.net/jdk/pull/6447.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6447/head:pull/6447 PR: https://git.openjdk.java.net/jdk/pull/6447 From aph at openjdk.java.net Fri Nov 19 14:03:23 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 19 Nov 2021 14:03:23 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v10] In-Reply-To: References: Message-ID: > The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. > The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, > > > typedef RegisterImpl *Register; > const Register r10 = ((Register)10); > > > Registers have accessors, e.g.: > > ` int RegisterImpl::encoding() const { return (intptr_t)this; }` > > This works by an accident of implementation: it is not legal C++. > > The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) > > > extern RegisterImpl all_Registers[num_Registers]; > int RegisterImpl::encoding() const { return this - all_Registers; } > > > After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. > > An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: > > ` int RegisterImpl::encoding() const { return _encoding; }` > > This would result in smaller code, but I suspect slower. > > If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Delete has_byte_register(). ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6280/files - new: https://git.openjdk.java.net/jdk/pull/6280/files/0ce81fd2..8dfc54f5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6280&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6280/head:pull/6280 PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Fri Nov 19 14:03:25 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 19 Nov 2021 14:03:25 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v9] In-Reply-To: References: <69byto1KlbJy9IA5Z6wV3nl8eQTQKyMAEg18sR6q2P8=.dbe9530e-cd25-4aa4-ad90-a66e91e95a0d@github.com> Message-ID: On Thu, 18 Nov 2021 09:45:51 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/register_aarch64.hpp line 63: >> >>> 61: int encoding() const { assert(is_valid(), "invalid register"); return encoding_nocheck(); } >>> 62: bool is_valid() const { return (unsigned)encoding_nocheck() < number_of_registers; } >>> 63: bool has_byte_register() const { return this >= first() && this - first() < number_of_byte_registers; } >> >> Why not relegate to encoding_nocheck() too: `return encoding_nocheck() >= 0 && encoding_nocheck() < num_byte_regs` ? > > x86 changes are for later. > As far as I can tell, `has_byte_register()` isn't used by anything, so I guess I'll take it out. > I was trying to minimize the scope of this patch. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From aph at openjdk.java.net Fri Nov 19 14:03:29 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 19 Nov 2021 14:03:29 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v3] In-Reply-To: References: Message-ID: <10ywVvRMw6cJ2PkBh8fxT_39ZADqOxsg7z24ekyV1K0=.15744e12-d65f-4bb5-9850-4be7307e08cf@github.com> On Sat, 13 Nov 2021 07:03:57 GMT, Thomas Stuefe wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Re-establish the FloatRegister::successor() hack. > > src/hotspot/share/asm/register.hpp line 71: > >> 69: >> 70: #define REGISTER_IMPL_DEFINITION(type, impl_type) \ >> 71: impl_type all_ ## type ## s[impl_type::number_of_declared_registers]; > > Would this not need attribute visibility too? No, it's done on the declaration. ------------- PR: https://git.openjdk.java.net/jdk/pull/6280 From stuefe at openjdk.java.net Fri Nov 19 14:11:45 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 19 Nov 2021 14:11:45 GMT Subject: RFR: 8276563: Undefined Behaviour in class Assembler [v10] In-Reply-To: References: Message-ID: <27Tmrc6KEK-y4tGCgebHni3BSLqLA60qecqmEMNYZIA=.d6446cec-ddb4-45e6-a03d-dcda97b861b4@github.com> On Fri, 19 Nov 2021 14:03:23 GMT, Andrew Haley wrote: >> The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises. >> The UB to which this patch relates is in class `Assembler`, in which instances are pointers to (nonexistent) objects defined as, for example, >> >> >> typedef RegisterImpl *Register; >> const Register r10 = ((Register)10); >> >> >> Registers have accessors, e.g.: >> >> ` int RegisterImpl::encoding() const { return (intptr_t)this; }` >> >> This works by an accident of implementation: it is not legal C++. >> >> The most obvious way to this UB bug is to make instances of `Register` point to something, and to use pointer subtraction to find the encoding: (simplified for clarity) >> >> >> extern RegisterImpl all_Registers[num_Registers]; >> int RegisterImpl::encoding() const { return this - all_Registers; } >> >> >> After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in `encoding()` and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch. >> >> An alternative way to implement this would be to make the encoding a byte-wide field in `RegisterImpl` and define encoding() this way: >> >> ` int RegisterImpl::encoding() const { return _encoding; }` >> >> This would result in smaller code, but I suspect slower. >> >> If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Delete has_byte_register(). Still good. ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6280 From dlong at openjdk.java.net Sat Nov 20 01:16:10 2021 From: dlong at openjdk.java.net (Dean Long) Date: Sat, 20 Nov 2021 01:16:10 GMT Subject: RFR: 8277239: SIGSEGV in vrshift_reg_maskedNode::emit In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 13:44:51 GMT, Jatin Bhateja wrote: > Currently instruction selector differentiates between the two kinds of vector shift operations i.e. one with vector shift count and other with scalar shift count passed though LShiftCntV/RShiftCntV nodes by looking at the ideal opcode of shift count node. > > A more robust scheme is to set a flag over vector shift node if it has variable vector shift count and replace the opcode based check with flag based check in various shift instruction selection patterns. Changes requested by dlong (Reviewer). src/hotspot/share/opto/vectorIntrinsics.cpp line 526: > 524: } else { > 525: const TypeVect* vt = TypeVect::make(elem_bt, num_elem, is_vector_mask(vbox_klass)); > 526: bool is_var_shift = VectorNode::is_shift_opcode(opc); how about moving this down to where it is used? ------------- PR: https://git.openjdk.java.net/jdk/pull/6431 From dlong at openjdk.java.net Sat Nov 20 01:29:08 2021 From: dlong at openjdk.java.net (Dean Long) Date: Sat, 20 Nov 2021 01:29:08 GMT Subject: RFR: 8274983: Pattern.matcher performance regression after JDK-8238358 In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 06:13:31 GMT, Xin Liu wrote: > The root cause of the C1 regression is that some regex generate multiple classes which all implement > an interface. In SlowStartupTest.java, the following **invokeinterface** happens frequently with different receivers. the target is a private interface method. > > > 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z > > > This patch allows c1 to generate the optimized virtual call for invokeinterface > whose targets are the private interface methods. > > Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private > interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, > LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because > it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, > it is possible that they trash the IC stub using their own concrete klass in runtime. > > Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM > 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. > Therefore, this patch can prevent the callsite from trashing. > > Before this patch, SlowStartupTest had 38770 times _resolve_invoke_virtual_cnt and 38695 _handle_wrong_method_cnt per 10k iterations. To dump `C1Statistics`, we use fastdebug build for comparison. > > > $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 > Executed 10000 iterations in 736ms > C1 Runtime statistics: > _resolve_invoke_virtual_cnt: 38770 > _resolve_invoke_opt_virtual_cnt: 186 > _resolve_invoke_static_cnt: 44 > _handle_wrong_method_cnt: 38695 > _ic_miss_cnt: 35 > > > With this patch, only 1 _handle_wrong_method_cnt is triggered but we have 3 more `_resolve_invoke_opt_virtual_cnt` events instead. The total runtime reduces from 736ms to 9ms. > > > $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 > Executed 10000 iterations in 9ms > C1 Runtime statistics: > _resolve_invoke_virtual_cnt: 77 > _resolve_invoke_opt_virtual_cnt: 189 > _resolve_invoke_static_cnt: 45 > _handle_wrong_method_cnt: 1 > _ic_miss_cnt: 39 > > > Codegen wise, before the patch, C1 generates LIR for the invokeinterface whose target is a private interface methoda as follows. > > __bci__use__tid____instr____________________________________ > . 1 0 v2 a1.invokeinterface() > InvokePrivateInterfaceMethod$I.bar()V > . 6 0 v3 return > > > With this patch, C1 generates LIR as follows. it first check a1 is a subtype of `InvokePrivateInterfaceMethod$I`. if so, an optimized virtual call is generated. The callsite will be fixed up once and only one time in runtime. > > __bci__use__tid____instr____________________________________ > . 1 1 a2 checkcast(a1) InvokePrivateInterfaceMethod$I > stack [0:a1] > . 1 0 v3 a2.invokeinterface() > InvokePrivateInterfaceMethod$I.bar()V > . 6 0 v4 return It looks like this does what we want for private interface methods, but I'm wondering if we handle all combinations of private/final and invokevirtual/invokeinterface, or are we missing some cases where can_be_statically_bound() would return true? ------------- PR: https://git.openjdk.java.net/jdk/pull/6445 From xliu at openjdk.java.net Sat Nov 20 04:44:12 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 20 Nov 2021 04:44:12 GMT Subject: RFR: 8274983: Pattern.matcher performance regression after JDK-8238358 In-Reply-To: References: Message-ID: On Sat, 20 Nov 2021 01:26:23 GMT, Dean Long wrote: >> The root cause of the C1 regression is that some regex generate multiple classes which all implement >> an interface. In SlowStartupTest.java, the following **invokeinterface** happens frequently with different receivers. the target is a private interface method. >> >> >> 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z >> >> >> This patch allows c1 to generate the optimized virtual call for invokeinterface >> whose targets are the private interface methods. >> >> Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private >> interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, >> LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because >> it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, >> it is possible that they trash the IC stub using their own concrete klass in runtime. >> >> Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM >> 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. >> Therefore, this patch can prevent the callsite from trashing. >> >> Before this patch, SlowStartupTest had 38770 times _resolve_invoke_virtual_cnt and 38695 _handle_wrong_method_cnt per 10k iterations. To dump `C1Statistics`, we use fastdebug build for comparison. >> >> >> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 >> Executed 10000 iterations in 736ms >> C1 Runtime statistics: >> _resolve_invoke_virtual_cnt: 38770 >> _resolve_invoke_opt_virtual_cnt: 186 >> _resolve_invoke_static_cnt: 44 >> _handle_wrong_method_cnt: 38695 >> _ic_miss_cnt: 35 >> >> >> With this patch, only 1 _handle_wrong_method_cnt is triggered but we have 3 more `_resolve_invoke_opt_virtual_cnt` events instead. The total runtime reduces from 736ms to 9ms. >> >> >> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 >> Executed 10000 iterations in 9ms >> C1 Runtime statistics: >> _resolve_invoke_virtual_cnt: 77 >> _resolve_invoke_opt_virtual_cnt: 189 >> _resolve_invoke_static_cnt: 45 >> _handle_wrong_method_cnt: 1 >> _ic_miss_cnt: 39 >> >> >> Codegen wise, before the patch, C1 generates LIR for the invokeinterface whose target is a private interface methoda as follows. >> >> __bci__use__tid____instr____________________________________ >> . 1 0 v2 a1.invokeinterface() >> InvokePrivateInterfaceMethod$I.bar()V >> . 6 0 v3 return >> >> >> With this patch, C1 generates LIR as follows. it first check a1 is a subtype of `InvokePrivateInterfaceMethod$I`. if so, an optimized virtual call is generated. The callsite will be fixed up once and only one time in runtime. >> >> __bci__use__tid____instr____________________________________ >> . 1 1 a2 checkcast(a1) InvokePrivateInterfaceMethod$I >> stack [0:a1] >> . 1 0 v3 a2.invokeinterface() >> InvokePrivateInterfaceMethod$I.bar()V >> . 6 0 v4 return > > It looks like this does what we want for private interface methods, but I'm wondering if we handle all combinations of private/final and invokevirtual/invokeinterface, or are we missing some cases where can_be_statically_bound() would return true? hi, @dean-long, I think C1 covers all cases as long as the target method is loaded. I have seen cases which target methods haven't been loaded in startup time, but they are rare. ciMethod::can_be_statically_bound() return true if the method is private or final. The matrix shows the modifiers of target methods. | | final | private | |-----------------|-------|---------| | invokevirtual | 1 | 2 | | invokespecial | N/A1 | 3 | | invokeinterface | N/A2 | 4 | 1. generates the optimized virtual call because [x->target_is_final()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_LIRGenerator.cpp#L2799) is true. 2. transforms to `invokespecial` [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_GraphBuilder.cpp#L1885) , then it will be case 3. 3. generates the optimize virtual call because `x->code() == Bytecodes::_invokespecial` is true. 4. is what this patch covers. NA-1. I think it's impossible for javac. it would be an optimized virtual call like case 3 even it existed. NA-2: it's an illegal modifier for an interface method. https://docs.oracle.com/javase/specs/jls/se17/html/jls-9.html#jls-InterfaceMethodModifier ------------- PR: https://git.openjdk.java.net/jdk/pull/6445 From dlong at openjdk.java.net Sat Nov 20 10:52:07 2021 From: dlong at openjdk.java.net (Dean Long) Date: Sat, 20 Nov 2021 10:52:07 GMT Subject: RFR: 8274983: Pattern.matcher performance regression after JDK-8238358 In-Reply-To: References: Message-ID: On Sat, 20 Nov 2021 04:40:51 GMT, Xin Liu wrote: >> It looks like this does what we want for private interface methods, but I'm wondering if we handle all combinations of private/final and invokevirtual/invokeinterface, or are we missing some cases where can_be_statically_bound() would return true? > > hi, @dean-long, > > I think C1 covers all cases as long as the target method is loaded. I have seen cases which target methods haven't been loaded in startup time, but they are rare. > > ciMethod::can_be_statically_bound() return true if the method is private or final. The matrix shows the modifiers of target methods. > > | | final | private | > |-----------------|-------|---------| > | invokevirtual | 1 | 2 | > | invokespecial | N/A1 | 3 | > | invokeinterface | N/A2 | 4 | > > 1. generates the optimized virtual call because [x->target_is_final()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_LIRGenerator.cpp#L2799) is true. > 2. transforms to `invokespecial` [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_GraphBuilder.cpp#L1885) , then it will be case 3. > 3. generates the optimize virtual call because `x->code() == Bytecodes::_invokespecial` is true. > 4. is what this patch covers. > > NA-1. I think it's impossible for javac. it would be an optimized virtual call like case 3 even it existed. > NA-2: it's an illegal modifier for an interface method. > https://docs.oracle.com/javase/specs/jls/se17/html/jls-9.html#jls-InterfaceMethodModifier Thanks @navyxliu. I wonder if we can do 2) and 3) for invokeinterface, simplying the patch. Something like: // Some methods are obviously bindable without any type checks so // convert them directly to an invokespecial or invokestatic. if (target->is_loaded() && !target->is_abstract() && target->can_be_statically_bound()) { switch (bc_raw) { case Bytecodes::_invokevirtual: case Bytecodes::_invokeinterface:: code = Bytecodes::_invokespecial; break; [...] // invoke-special-super if (code == Bytecodes::_invokespecial && !target->is_object_initializer()) { ciInstanceKlass* sender_klass = calling_klass; if (sender_klass->is_interface()) { [...] What do you think? ------------- PR: https://git.openjdk.java.net/jdk/pull/6445 From dlong at openjdk.java.net Sun Nov 21 08:57:06 2021 From: dlong at openjdk.java.net (Dean Long) Date: Sun, 21 Nov 2021 08:57:06 GMT Subject: RFR: 8277423: ciReplay: hidden class with comment expected error In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 23:22:19 GMT, Dean Long wrote: > Refactor code to dump hidden classes consistently. Thanks Christian. ------------- PR: https://git.openjdk.java.net/jdk/pull/6467 From jbhateja at openjdk.java.net Sun Nov 21 19:38:36 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 21 Nov 2021 19:38:36 GMT Subject: RFR: 8277239: SIGSEGV in vrshift_reg_maskedNode::emit [v2] In-Reply-To: References: Message-ID: > Currently instruction selector differentiates between the two kinds of vector shift operations i.e. one with vector shift count and other with scalar shift count passed though LShiftCntV/RShiftCntV nodes by looking at the ideal opcode of shift count node. > > A more robust scheme is to set a flag over vector shift node if it has variable vector shift count and replace the opcode based check with flag based check in various shift instruction selection patterns. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8277239: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6431/files - new: https://git.openjdk.java.net/jdk/pull/6431/files/97dce5ca..41d33df0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6431&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6431&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6431.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6431/head:pull/6431 PR: https://git.openjdk.java.net/jdk/pull/6431 From yadongwang at openjdk.java.net Mon Nov 22 01:03:23 2021 From: yadongwang at openjdk.java.net (Yadong Wang) Date: Mon, 22 Nov 2021 01:03:23 GMT Subject: RFR: 8277508: need to check has_predicated_vectors before calling scalable_predicate_reg_slots Message-ID: Hi, Team, A separate set of predicate registers is not mandatory for an implementation of scalable vectors. It will cause a failure in some platform which supports scalable vectors without explicit predicated registers, like riscv. All code about RegVectMask should be covered by has_predicated_vectors here in Matcher::init_first_stack_mask(). Yadong ------------- Commit messages: - 8277508: missed to check has_predicated_vectors before calling scalable_predicate_reg_slots Changes: https://git.openjdk.java.net/jdk/pull/6492/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6492&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277508 Stats: 13 lines in 1 file changed: 2 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/6492.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6492/head:pull/6492 PR: https://git.openjdk.java.net/jdk/pull/6492 From yadongwang at openjdk.java.net Mon Nov 22 01:15:30 2021 From: yadongwang at openjdk.java.net (Yadong Wang) Date: Mon, 22 Nov 2021 01:15:30 GMT Subject: RFR: 8277508: need to check has_predicated_vectors before calling scalable_predicate_reg_slots [v2] In-Reply-To: References: Message-ID: > Hi, Team, > A separate set of predicate registers is not mandatory for an implementation of scalable vectors. It will cause a failure in some platform which supports scalable vectors without explicit predicated registers, like riscv. All code about RegVectMask should be covered by has_predicated_vectors here in Matcher::init_first_stack_mask(). > > Yadong Yadong Wang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8277508: need to check has_predicated_vectors before calling scalable_predicate_reg_slots ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6492/files - new: https://git.openjdk.java.net/jdk/pull/6492/files/b3171c26..5f54b7fc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6492&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6492&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6492.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6492/head:pull/6492 PR: https://git.openjdk.java.net/jdk/pull/6492 From njian at openjdk.java.net Mon Nov 22 01:30:08 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 22 Nov 2021 01:30:08 GMT Subject: RFR: 8277508: need to check has_predicated_vectors before calling scalable_predicate_reg_slots [v2] In-Reply-To: References: Message-ID: <7jHlgvYPtNCxyeDmNRpiS6503iRAGurYuRBDkntrkno=.d4304f60-ab58-44a3-96ff-89c261f341eb@github.com> On Mon, 22 Nov 2021 01:15:30 GMT, Yadong Wang wrote: >> Hi, Team, >> A separate set of predicate registers is not mandatory for an implementation of scalable vectors. It will cause a failure in some platform which supports scalable vectors without explicit predicated registers, like riscv. All code about RegVectMask should be covered by has_predicated_vectors here in Matcher::init_first_stack_mask(). >> >> Yadong > > Yadong Wang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Looks good to me. ------------- Marked as reviewed by njian (Committer). PR: https://git.openjdk.java.net/jdk/pull/6492 From duke at openjdk.java.net Mon Nov 22 02:34:12 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Mon, 22 Nov 2021 02:34:12 GMT Subject: Integrated: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE In-Reply-To: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: On Fri, 22 Oct 2021 00:34:03 GMT, TatWai Chong wrote: > After JDK-8269559 was integrated there are failures in tier1 testing > across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. > > This patch is NOT functional; rather, this tends to verify potential > toolchain issues as the original patch pass testing on other > platforms. > > In this patch, we remove new SVE-related matching rules and register > class introduced in the original patch to minimally affect the > non-SVE part. This pull request has now been integrated. Changeset: ca31ed53 Author: TatWai Chong URL: https://git.openjdk.java.net/jdk/commit/ca31ed5335f6fa7229c94ba20d9d6031b930d69a Stats: 423 lines in 9 files changed: 412 ins; 0 del; 11 mod 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE Reviewed-by: ngasson, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From duke at openjdk.java.net Mon Nov 22 08:01:37 2021 From: duke at openjdk.java.net (Takuya Kiriyama) Date: Mon, 22 Nov 2021 08:01:37 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v3] In-Reply-To: References: Message-ID: <6DAeTyhPeBksnUnR_xPSfnuADN1ZxUJjV6t8LcExGgY=.244666c5-5331-4144-bb06-82e230c3377a@github.com> > Could you please review the 8277042 code? > This is the enhancement for 8276036. > I add a new test to verify the value of full_count in the message of insufficient codecache. Takuya Kiriyama has updated the pull request incrementally with one additional commit since the last revision: 8277042: add test for 8276036 to compiler/codecache ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6364/files - new: https://git.openjdk.java.net/jdk/pull/6364/files/07aa7c86..d6352295 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6364&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6364&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6364.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6364/head:pull/6364 PR: https://git.openjdk.java.net/jdk/pull/6364 From duke at openjdk.java.net Mon Nov 22 08:01:41 2021 From: duke at openjdk.java.net (Takuya Kiriyama) Date: Mon, 22 Nov 2021 08:01:41 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:12:35 GMT, Tobias Hartmann wrote: >> Takuya Kiriyama has updated the pull request incrementally with one additional commit since the last revision: >> >> 8277042: add test for 8276036 to compiler/codecache > > test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java line 34: > >> 32: /* >> 33: * @test >> 34: * @bug 8276036 > > I have a fix ready for JDK-8277213 (see https://github.com/openjdk/jdk/pull/6449). Could you please add the bug ID to this test? I see. I added the bug ID to this test for JDK-8277213. I will probably add JDK-8277441 currently being discussed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6364 From rah.v.ragh at gmail.com Mon Nov 22 08:49:15 2021 From: rah.v.ragh at gmail.com (Rahul) Date: Mon, 22 Nov 2021 14:19:15 +0530 Subject: Queries on AVX512 support in Hotspot Message-ID: Hi, Request help with questions on AVX512 support in Hotspot. Please note I am trying to find existing AVX512 support in hotspot. Understood that the support started with JDK-8076276 enhancement. When compared with instruction set manuals it seems full AVX512 instructions are not supported for now. (e.g.: AVX512_IFMA, AVX512_BF16 set instructions etc. seems not supported Also though feature CPU_AVX512F, AVX512PF etc. feature set is enabled, again it seems all instructions in the set may not be supported.) So is the existing support added so far documented somewhere? Also any details of any ongoing, future plans to add remaining AVX-512 support? Aslo trying to check available jtreg tests, benchmarks related to AVX-512. Is the main related tests located at - test/hotspot/jtreg/compiler/loopopts/superword/ ? (Also found test/hotspot/jtreg/compiler/loopopts/superword/TestArrayCopyConjoint.java, TestArrayCopyDisjoint.java tests) Are there any other functional or unit tests to check exact AVX instructions generated. For example, to catch situations like AVX2 instructions getting wrongly generated instead of available/expected AVX3 instructions !?. Request guidance with existing AVX-512 support tests, benchmarks. Thanks, Rahul From shade at openjdk.java.net Mon Nov 22 09:12:11 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 22 Nov 2021 09:12:11 GMT Subject: RFR: 8277385: Zero: Enable CompactStrings support In-Reply-To: <-tldYMqymQt2bG54787XcYNM1AQmaz7MpMaD7ijAges=.48f22102-66f8-44d7-80be-5674f0638bef@github.com> References: <-tldYMqymQt2bG54787XcYNM1AQmaz7MpMaD7ijAges=.48f22102-66f8-44d7-80be-5674f0638bef@github.com> Message-ID: On Thu, 18 Nov 2021 15:14:55 GMT, Aleksey Shipilev wrote: > This enables `CompactStrings` for Zero. When we were doing original Compact Strings in JDK 9, we disabled the support on non-primary platforms, hoping relevant maintainers would follow up with platform-specific work. Here is me following up, as Zero maintainer :) > > There is little to do on Zero side, as it is pure interpreter without String intrinsics. Other platforms had old-shaped String intrinsics, so for them enabling the feature would mean implementing Compact-String-shaped intrinsics too. But this is irrelevant for Zero. There is still benefit of doing less work with smaller Strings. > > There are no regressions on the benchmarks I tried, and some benchmarks improve significantly. Notably, specjvm:{compiler,sunflow,derby,xmlvalidation} improve about 5%, specjvm:{serial,xmltransform} improve about 20% on x86_64. > > Additional testing: > - [x] Linux x86_64 Zero benchmarks > - [x] Linux x86_64 Zero `tier1` Thanks! Extended Zero testing seems fine. ------------- PR: https://git.openjdk.java.net/jdk/pull/6459 From shade at openjdk.java.net Mon Nov 22 09:12:11 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 22 Nov 2021 09:12:11 GMT Subject: Integrated: 8277385: Zero: Enable CompactStrings support In-Reply-To: <-tldYMqymQt2bG54787XcYNM1AQmaz7MpMaD7ijAges=.48f22102-66f8-44d7-80be-5674f0638bef@github.com> References: <-tldYMqymQt2bG54787XcYNM1AQmaz7MpMaD7ijAges=.48f22102-66f8-44d7-80be-5674f0638bef@github.com> Message-ID: On Thu, 18 Nov 2021 15:14:55 GMT, Aleksey Shipilev wrote: > This enables `CompactStrings` for Zero. When we were doing original Compact Strings in JDK 9, we disabled the support on non-primary platforms, hoping relevant maintainers would follow up with platform-specific work. Here is me following up, as Zero maintainer :) > > There is little to do on Zero side, as it is pure interpreter without String intrinsics. Other platforms had old-shaped String intrinsics, so for them enabling the feature would mean implementing Compact-String-shaped intrinsics too. But this is irrelevant for Zero. There is still benefit of doing less work with smaller Strings. > > There are no regressions on the benchmarks I tried, and some benchmarks improve significantly. Notably, specjvm:{compiler,sunflow,derby,xmlvalidation} improve about 5%, specjvm:{serial,xmltransform} improve about 20% on x86_64. > > Additional testing: > - [x] Linux x86_64 Zero benchmarks > - [x] Linux x86_64 Zero `tier1` This pull request has now been integrated. Changeset: 3f847fe8 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/3f847fe89a088d6921107ca887a7a1bace871bd6 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod 8277385: Zero: Enable CompactStrings support Reviewed-by: redestad, adinn ------------- PR: https://git.openjdk.java.net/jdk/pull/6459 From duke at openjdk.java.net Mon Nov 22 09:34:15 2021 From: duke at openjdk.java.net (Takuya Kiriyama) Date: Mon, 22 Nov 2021 09:34:15 GMT Subject: RFR: 8276036: The value of full_count in the message of insufficient codecache is wrong In-Reply-To: References: Message-ID: On Wed, 27 Oct 2021 02:35:29 GMT, Takuya Kiriyama wrote: > Could you please review the 8276036 bug fixes? > > This bug is caused by the wrong place to add the value of full_count. > The initial value of full_count is 0, so it needs to be added before outputting the message. JDK-8276036 was already fixed. I close this pull request. ------------- PR: https://git.openjdk.java.net/jdk/pull/6129 From duke at openjdk.java.net Mon Nov 22 09:34:15 2021 From: duke at openjdk.java.net (Takuya Kiriyama) Date: Mon, 22 Nov 2021 09:34:15 GMT Subject: Withdrawn: 8276036: The value of full_count in the message of insufficient codecache is wrong In-Reply-To: References: Message-ID: On Wed, 27 Oct 2021 02:35:29 GMT, Takuya Kiriyama wrote: > Could you please review the 8276036 bug fixes? > > This bug is caused by the wrong place to add the value of full_count. > The initial value of full_count is 0, so it needs to be added before outputting the message. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6129 From dlong at openjdk.java.net Mon Nov 22 10:27:08 2021 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 22 Nov 2021 10:27:08 GMT Subject: RFR: 8277239: SIGSEGV in vrshift_reg_maskedNode::emit [v2] In-Reply-To: References: Message-ID: On Sun, 21 Nov 2021 19:38:36 GMT, Jatin Bhateja wrote: >> Currently instruction selector differentiates between the two kinds of vector shift operations i.e. one with vector shift count and other with scalar shift count passed though LShiftCntV/RShiftCntV nodes by looking at the ideal opcode of shift count node. >> >> A more robust scheme is to set a flag over vector shift node if it has variable vector shift count and replace the opcode based check with flag based check in various shift instruction selection patterns. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8277239: Review comments resolution. Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6431 From eosterlund at openjdk.java.net Mon Nov 22 11:12:22 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 22 Nov 2021 11:12:22 GMT Subject: RFR: 8277411: C2 fast_unlock intrinsic on AArch64 has unnecessary ownership check Message-ID: The AArch64 fast_unlock C2 code checks if the current thread owns the lock. This can be surprisingly expensive in workload where locking is contended. The check is however optional (helpful only for finding JNI code bugs), and indeed not emitted for x86_64. This patch removes the check on AArch64 as well. ------------- Commit messages: - 8277411: C2 fast_unlock intrinsic on AArch64 has unnecessary ownership check Changes: https://git.openjdk.java.net/jdk/pull/6498/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6498&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277411 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6498.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6498/head:pull/6498 PR: https://git.openjdk.java.net/jdk/pull/6498 From aph at openjdk.java.net Mon Nov 22 13:12:03 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 22 Nov 2021 13:12:03 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: <-cKJ36p7ENbWiSZrA6e1MdTWT8wnVEvQ4njxZwbJ-Mw=.43615cd9-907e-4142-9dec-9b337ef9b754@github.com> On Mon, 8 Nov 2021 18:20:10 GMT, Andrew Haley wrote: > We surely need a reproducer for this one. Well, I think I do anyway. Have you got anything? ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From thartmann at openjdk.java.net Mon Nov 22 14:04:54 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 22 Nov 2021 14:04:54 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v2] In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 07:57:47 GMT, Takuya Kiriyama wrote: >> test/hotspot/jtreg/compiler/codecache/CodeCacheFullCountTest.java line 34: >> >>> 32: /* >>> 33: * @test >>> 34: * @bug 8276036 >> >> I have a fix ready for JDK-8277213 (see https://github.com/openjdk/jdk/pull/6449). Could you please add the bug ID to this test? > > I see. I added the bug ID to this test for JDK-8277213. > I will probably add JDK-8277441 currently being discussed. Thanks, yes, please add 8277441 as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/6364 From thartmann at openjdk.java.net Mon Nov 22 14:08:59 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 22 Nov 2021 14:08:59 GMT Subject: RFR: 8277441: CompileQueue::add fails with assert(_last->next() == __null) failed: not last Message-ID: <8B-fG4oCz3vVuq2qJHCTLNeJdvZj_YCInjHSsx4P7bQ=.c1216330-9179-4103-8fa5-aff03a3aa5cf@github.com> In the rare case that the compiler threads fail during initialization or the code cache is full and flushing is disabled, we completely disable JIT compilation and shut down the compiler runtime: https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1813-L1817 In the process, we free all compiler queues and set `CompileQueue::_first` to `NULL` and put the `CompileTasks` on the free list: https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileTask.cpp#L84 In rare cases, although compilation is disabled, another waiting thread might still call `CompileQueue::add`. That code then fails because `_last != NULL` and `_last->next()` is set to `_task_free_list`. https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L362-L369 The fix is to set `_last` to `NULL` in `CompileQueue::free_all`. Adding to the compile queue then succeeds which is harmless because queues have only been freed to make the compiler threads exit faster: https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1823 The test that triggered this will be added with [PR 6364](https://github.com/openjdk/jdk/pull/6364). I verified that it now passes. Thanks, Tobias ------------- Commit messages: - 8277441: CompileQueue::add fails with assert(_last->next() == __null) failed: not last Changes: https://git.openjdk.java.net/jdk/pull/6503/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6503&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277441 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6503.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6503/head:pull/6503 PR: https://git.openjdk.java.net/jdk/pull/6503 From thartmann at openjdk.java.net Mon Nov 22 14:22:11 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 22 Nov 2021 14:22:11 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 18:26:36 GMT, Ludvig Janiuk wrote: > Refactor PredecessorValidator, more or less applying the following: > > declare variables where used > redeclare instead of reuse variables > move assert to a more logical place > remove unused length variable > inline variables where senseful > split loops > extract methods > > this is done in preparation for work on optimizing IR::verify. IR::verify calls PredecessorValidator. If the work of PredecessorValidator is made clearer, it will be easier to reason about where IR::verify doesn't need to be called (or where a subset of it would suffice). Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6394 From thartmann at openjdk.java.net Mon Nov 22 14:25:47 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 22 Nov 2021 14:25:47 GMT Subject: RFR: JDK-8277382 make c1 BlockMerger use IR::verify only when necessary In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 13:30:02 GMT, Ludvig Janiuk wrote: > This PR removes two calls to `IR::verify` which were unnecessary. The reason they are unnecessary is that `try_merge` does not always take any action. There is not need to verify if nothing has changed. In the cases that `try_merge` does do anything, it already calls `IR::verify` afterwards. > > This PR also switches some deeply nested if statements in `try_merge` to early returns. Changes requested by thartmann (Reviewer). src/hotspot/share/c1/c1_Optimizer.cpp line 374: > 372: assert(sux_value == end_state->local_at(index), "locals not equal"); > 373: } > 374: assert(sux_state->caller_state() == end_state->caller_state(), "caller not equal"); The indentation is wrong. ------------- PR: https://git.openjdk.java.net/jdk/pull/6456 From thartmann at openjdk.java.net Mon Nov 22 14:30:12 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 22 Nov 2021 14:30:12 GMT Subject: RFR: 8277423: ciReplay: hidden class with comment expected error In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 23:22:19 GMT, Dean Long wrote: > Refactor code to dump hidden classes consistently. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6467 From chagedorn at openjdk.java.net Mon Nov 22 14:36:16 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 22 Nov 2021 14:36:16 GMT Subject: RFR: 8277441: CompileQueue::add fails with assert(_last->next() == __null) failed: not last In-Reply-To: <8B-fG4oCz3vVuq2qJHCTLNeJdvZj_YCInjHSsx4P7bQ=.c1216330-9179-4103-8fa5-aff03a3aa5cf@github.com> References: <8B-fG4oCz3vVuq2qJHCTLNeJdvZj_YCInjHSsx4P7bQ=.c1216330-9179-4103-8fa5-aff03a3aa5cf@github.com> Message-ID: On Mon, 22 Nov 2021 13:57:21 GMT, Tobias Hartmann wrote: > In the rare case that the compiler threads fail during initialization or the code cache is full and flushing is disabled, we completely disable JIT compilation and shut down the compiler runtime: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1813-L1817 > > In the process, we free all compiler queues and set `CompileQueue::_first` to `NULL` and put the `CompileTasks` on the free list: https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileTask.cpp#L84 > > In rare cases, although compilation is disabled, another waiting thread might still call `CompileQueue::add`. That code then fails because `_last != NULL` and `_last->next()` is set to `_task_free_list`. > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L362-L369 > > The fix is to set `_last` to `NULL` in `CompileQueue::free_all`. Adding to the compile queue then succeeds which is harmless because queues have only been freed to make the compiler threads exit faster: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1823 > > The test that triggered this will be added with [PR 6364](https://github.com/openjdk/jdk/pull/6364). I verified that it now passes. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6503 From thartmann at openjdk.java.net Mon Nov 22 14:49:17 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 22 Nov 2021 14:49:17 GMT Subject: RFR: 8277441: CompileQueue::add fails with assert(_last->next() == __null) failed: not last In-Reply-To: <8B-fG4oCz3vVuq2qJHCTLNeJdvZj_YCInjHSsx4P7bQ=.c1216330-9179-4103-8fa5-aff03a3aa5cf@github.com> References: <8B-fG4oCz3vVuq2qJHCTLNeJdvZj_YCInjHSsx4P7bQ=.c1216330-9179-4103-8fa5-aff03a3aa5cf@github.com> Message-ID: On Mon, 22 Nov 2021 13:57:21 GMT, Tobias Hartmann wrote: > In the rare case that the compiler threads fail during initialization or the code cache is full and flushing is disabled, we completely disable JIT compilation and shut down the compiler runtime: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1813-L1817 > > In the process, we free all compiler queues and set `CompileQueue::_first` to `NULL` and put the `CompileTasks` on the free list: https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileTask.cpp#L84 > > In rare cases, although compilation is disabled, another waiting thread might still call `CompileQueue::add`. That code then fails because `_last != NULL` and `_last->next()` is set to `_task_free_list`. > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L362-L369 > > The fix is to set `_last` to `NULL` in `CompileQueue::free_all`. Adding to the compile queue then succeeds which is harmless because queues have only been freed to make the compiler threads exit faster: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1823 > > The test that triggered this will be added with [PR 6364](https://github.com/openjdk/jdk/pull/6364). I verified that it now passes. > > Thanks, > Tobias Thanks for the review, Christian! ------------- PR: https://git.openjdk.java.net/jdk/pull/6503 From thartmann at openjdk.java.net Mon Nov 22 15:51:20 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 22 Nov 2021 15:51:20 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 10:33:29 GMT, Christian Hagedorn wrote: >> This patch adds support to explicitly apply incremental inlining when replay compiling a method if the original compilation of the method was also incrementally inlined. We write a new value when dumping the inline tree to indicate if an inlinee was incrementally inlined (`= 1`) or not (`= 0`). >> >> To implement this, I updated the `REPLAY_VERSION` to 2 and additionally added a test to verify that old replay file versions are still working. I added some support to modify/remove version numbers of generated replay files in tests. I also refactored the test added by JDK-8275868 to reuse some of the methods. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > update AlwaysIncrementalInline and assert The changes look good to me. Nice test! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6413 From chagedorn at openjdk.java.net Mon Nov 22 16:02:18 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 22 Nov 2021 16:02:18 GMT Subject: RFR: 8254108: ciReplay: Support incremental inlining [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 10:33:29 GMT, Christian Hagedorn wrote: >> This patch adds support to explicitly apply incremental inlining when replay compiling a method if the original compilation of the method was also incrementally inlined. We write a new value when dumping the inline tree to indicate if an inlinee was incrementally inlined (`= 1`) or not (`= 0`). >> >> To implement this, I updated the `REPLAY_VERSION` to 2 and additionally added a test to verify that old replay file versions are still working. I added some support to modify/remove version numbers of generated replay files in tests. I also refactored the test added by JDK-8275868 to reuse some of the methods. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > update AlwaysIncrementalInline and assert Thanks Tobias for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6413 From chagedorn at openjdk.java.net Mon Nov 22 16:08:25 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 22 Nov 2021 16:08:25 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) In-Reply-To: References: Message-ID: <8zsyGZ1BGai-zfgnG9K5m_8S3vLQX-S6f-pIOfjyRow=.e5aa7aaf-96c2-47a5-8015-8f50466a982b@github.com> On Mon, 15 Nov 2021 18:26:36 GMT, Ludvig Janiuk wrote: > Refactor PredecessorValidator, more or less applying the following: > > declare variables where used > redeclare instead of reuse variables > move assert to a more logical place > remove unused length variable > inline variables where senseful > split loops > extract methods > > this is done in preparation for work on optimizing IR::verify. IR::verify calls PredecessorValidator. If the work of PredecessorValidator is made clearer, it will be easier to reason about where IR::verify doesn't need to be called (or where a subset of it would suffice). Otherwise, looks good! src/hotspot/share/c1/c1_IR.cpp line 1303: > 1301: > 1302: private: > 1303: void verify_successor_xentry_flag(const BlockBegin *block) const { For this and other methods below: Asterisk should be at the type: `BlockBegin* block`. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6394 From jbhateja at openjdk.java.net Mon Nov 22 16:15:19 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 22 Nov 2021 16:15:19 GMT Subject: RFR: 8277239: SIGSEGV in vrshift_reg_maskedNode::emit [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 18:26:06 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8277239: Review comments resolution. > > The patch looks good to me. Thanks @sviswa7 and @dean-long ------------- PR: https://git.openjdk.java.net/jdk/pull/6431 From jbhateja at openjdk.java.net Mon Nov 22 16:42:20 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 22 Nov 2021 16:42:20 GMT Subject: Integrated: 8277239: SIGSEGV in vrshift_reg_maskedNode::emit In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 13:44:51 GMT, Jatin Bhateja wrote: > Currently instruction selector differentiates between the two kinds of vector shift operations i.e. one with vector shift count and other with scalar shift count passed though LShiftCntV/RShiftCntV nodes by looking at the ideal opcode of shift count node. > > A more robust scheme is to set a flag over vector shift node if it has variable vector shift count and replace the opcode based check with flag based check in various shift instruction selection patterns. This pull request has now been integrated. Changeset: e5298655 Author: Jatin Bhateja URL: https://git.openjdk.java.net/jdk/commit/e529865531d0eb5a2119a1d220b195d088794226 Stats: 133 lines in 5 files changed: 72 ins; 3 del; 58 mod 8277239: SIGSEGV in vrshift_reg_maskedNode::emit Reviewed-by: sviswanathan, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/6431 From neliasso at openjdk.java.net Mon Nov 22 16:48:17 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 22 Nov 2021 16:48:17 GMT Subject: RFR: 8277441: CompileQueue::add fails with assert(_last->next() == __null) failed: not last In-Reply-To: <8B-fG4oCz3vVuq2qJHCTLNeJdvZj_YCInjHSsx4P7bQ=.c1216330-9179-4103-8fa5-aff03a3aa5cf@github.com> References: <8B-fG4oCz3vVuq2qJHCTLNeJdvZj_YCInjHSsx4P7bQ=.c1216330-9179-4103-8fa5-aff03a3aa5cf@github.com> Message-ID: On Mon, 22 Nov 2021 13:57:21 GMT, Tobias Hartmann wrote: > In the rare case that the compiler threads fail during initialization or the code cache is full and flushing is disabled, we completely disable JIT compilation and shut down the compiler runtime: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1813-L1817 > > In the process, we free all compiler queues and set `CompileQueue::_first` to `NULL` and put the `CompileTasks` on the free list: https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileTask.cpp#L84 > > In rare cases, although compilation is disabled, another waiting thread might still call `CompileQueue::add`. That code then fails because `_last != NULL` and `_last->next()` is set to `_task_free_list`. > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L362-L369 > > The fix is to set `_last` to `NULL` in `CompileQueue::free_all`. Adding to the compile queue then succeeds which is harmless because queues have only been freed to make the compiler threads exit faster: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1823 > > The test that triggered this will be added with [PR 6364](https://github.com/openjdk/jdk/pull/6364). I verified that it now passes. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6503 From dcubed at openjdk.java.net Mon Nov 22 19:07:44 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 22 Nov 2021 19:07:44 GMT Subject: Integrated: 8277576: ProblemList runtime/ErrorHandling/CreateCoredumpOnCrash.java on macosx-X64 Message-ID: A few trivial ProblemListings: 8277576 ProblemList runtime/ErrorHandling/CreateCoredumpOnCrash.java on macosx-X64 8277577 ProblemList compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java on linux-aarch64 8277578 ProblemList applications/jcstress/acqrel.java on linux-aarch64 ------------- Commit messages: - 8277578: ProblemList applications/jcstress/acqrel.java on linux-aarch64 - 8277577: ProblemList compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java on linux-aarch64 - 8277576: ProblemList runtime/ErrorHandling/CreateCoredumpOnCrash.java on macosx-X64 Changes: https://git.openjdk.java.net/jdk/pull/6507/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6507&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277576 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6507.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6507/head:pull/6507 PR: https://git.openjdk.java.net/jdk/pull/6507 From mikael at openjdk.java.net Mon Nov 22 19:07:44 2021 From: mikael at openjdk.java.net (Mikael Vidstedt) Date: Mon, 22 Nov 2021 19:07:44 GMT Subject: Integrated: 8277576: ProblemList runtime/ErrorHandling/CreateCoredumpOnCrash.java on macosx-X64 In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 18:46:14 GMT, Daniel D. Daugherty wrote: > A few trivial ProblemListings: > > 8277576 ProblemList runtime/ErrorHandling/CreateCoredumpOnCrash.java on macosx-X64 > 8277577 ProblemList compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java on linux-aarch64 > 8277578 ProblemList applications/jcstress/acqrel.java on linux-aarch64 Marked as reviewed by mikael (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6507 From dcubed at openjdk.java.net Mon Nov 22 19:07:45 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 22 Nov 2021 19:07:45 GMT Subject: Integrated: 8277576: ProblemList runtime/ErrorHandling/CreateCoredumpOnCrash.java on macosx-X64 In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 18:52:20 GMT, Mikael Vidstedt wrote: >> A few trivial ProblemListings: >> >> 8277576 ProblemList runtime/ErrorHandling/CreateCoredumpOnCrash.java on macosx-X64 >> 8277577 ProblemList compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java on linux-aarch64 >> 8277578 ProblemList applications/jcstress/acqrel.java on linux-aarch64 > > Marked as reviewed by mikael (Reviewer). @vidmik - Thanks for the fast review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6507 From dcubed at openjdk.java.net Mon Nov 22 19:07:47 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 22 Nov 2021 19:07:47 GMT Subject: Integrated: 8277576: ProblemList runtime/ErrorHandling/CreateCoredumpOnCrash.java on macosx-X64 In-Reply-To: References: Message-ID: <4uPl1LAAipGVsPSsdFq43CGDgxgXhy8VpCn1PKbu60E=.4cc6a6da-ac51-4efb-aaff-bc4cbcb7d304@github.com> On Mon, 22 Nov 2021 18:46:14 GMT, Daniel D. Daugherty wrote: > A few trivial ProblemListings: > > 8277576 ProblemList runtime/ErrorHandling/CreateCoredumpOnCrash.java on macosx-X64 > 8277577 ProblemList compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java on linux-aarch64 > 8277578 ProblemList applications/jcstress/acqrel.java on linux-aarch64 This pull request has now been integrated. Changeset: 1049aba1 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/1049aba1fb65fd70bd723c80a84250512a68d653 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8277576: ProblemList runtime/ErrorHandling/CreateCoredumpOnCrash.java on macosx-X64 8277577: ProblemList compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java on linux-aarch64 8277578: ProblemList applications/jcstress/acqrel.java on linux-aarch64 Reviewed-by: mikael ------------- PR: https://git.openjdk.java.net/jdk/pull/6507 From dlong at openjdk.java.net Mon Nov 22 20:54:20 2021 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 22 Nov 2021 20:54:20 GMT Subject: RFR: 8277423: ciReplay: hidden class with comment expected error In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 23:22:19 GMT, Dean Long wrote: > Refactor code to dump hidden classes consistently. Thanks Tobias. ------------- PR: https://git.openjdk.java.net/jdk/pull/6467 From dlong at openjdk.java.net Mon Nov 22 20:54:21 2021 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 22 Nov 2021 20:54:21 GMT Subject: Integrated: 8277423: ciReplay: hidden class with comment expected error In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 23:22:19 GMT, Dean Long wrote: > Refactor code to dump hidden classes consistently. This pull request has now been integrated. Changeset: 05a9a51d Author: Dean Long URL: https://git.openjdk.java.net/jdk/commit/05a9a51dbfc46eb52bc28f1f9a618c75ee2597e9 Stats: 29 lines in 3 files changed: 18 ins; 9 del; 2 mod 8277423: ciReplay: hidden class with comment expected error Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6467 From sviswanathan at openjdk.java.net Mon Nov 22 23:15:10 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 22 Nov 2021 23:15:10 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Fri, 19 Nov 2021 11:43:16 GMT, Mai ??ng Qu?n Anh wrote: >> Hi, >> >> This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. >> >> The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into vectorMaskReduction > - reduce some dependencies with spare register > - improve mask reduction logic on AVX @merykitty Thanks for contributing this optimization. The patch looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From sviswanathan at openjdk.java.net Mon Nov 22 23:19:11 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 22 Nov 2021 23:19:11 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Fri, 19 Nov 2021 11:43:16 GMT, Mai ??ng Qu?n Anh wrote: >> Hi, >> >> This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. >> >> The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into vectorMaskReduction > - reduce some dependencies with spare register > - improve mask reduction logic on AVX @PaulSandoz Could you please run this through Oracle testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From psandoz at openjdk.java.net Mon Nov 22 23:22:19 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 22 Nov 2021 23:22:19 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: <6Quq9p9Qh1T1-77IKdhUjS69PRXmSlEYjNCVSwJ9Y8o=.944179d8-0237-48e0-8494-63d2ac096e5b@github.com> On Fri, 19 Nov 2021 11:43:16 GMT, Mai ??ng Qu?n Anh wrote: >> Hi, >> >> This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. >> >> The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into vectorMaskReduction > - reduce some dependencies with spare register > - improve mask reduction logic on AVX @merykitty does this PR still disable the operations on Neon? ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From sviswanathan at openjdk.java.net Tue Nov 23 00:19:09 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 23 Nov 2021 00:19:09 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: <6Quq9p9Qh1T1-77IKdhUjS69PRXmSlEYjNCVSwJ9Y8o=.944179d8-0237-48e0-8494-63d2ac096e5b@github.com> References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> <6Quq9p9Qh1T1-77IKdhUjS69PRXmSlEYjNCVSwJ9Y8o=.944179d8-0237-48e0-8494-63d2ac096e5b@github.com> Message-ID: On Mon, 22 Nov 2021 23:19:12 GMT, Paul Sandoz wrote: >> Mai ??ng Qu?n Anh has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into vectorMaskReduction >> - reduce some dependencies with spare register >> - improve mask reduction logic on AVX > > @merykitty does this PR still disable the operations on Neon? > > Answering myself, i think not from looking at the changes in the panama repo version: https://github.com/openjdk/panama-vector/pull/158/commits/9b578a3a59d577ac95b50b485bedd91463ca5ce8 @PaulSandoz The changes are specific to x86 code gen. Only x86 backend files are changed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From psandoz at openjdk.java.net Tue Nov 23 02:41:13 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Tue, 23 Nov 2021 02:41:13 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: <6Quq9p9Qh1T1-77IKdhUjS69PRXmSlEYjNCVSwJ9Y8o=.944179d8-0237-48e0-8494-63d2ac096e5b@github.com> References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> <6Quq9p9Qh1T1-77IKdhUjS69PRXmSlEYjNCVSwJ9Y8o=.944179d8-0237-48e0-8494-63d2ac096e5b@github.com> Message-ID: On Mon, 22 Nov 2021 23:19:12 GMT, Paul Sandoz wrote: >> Mai ??ng Qu?n Anh has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into vectorMaskReduction >> - reduce some dependencies with spare register >> - improve mask reduction logic on AVX > > @merykitty does this PR still disable the operations on Neon? > > Answering myself, i think not from looking at the changes in the panama repo version: https://github.com/openjdk/panama-vector/pull/158/commits/9b578a3a59d577ac95b50b485bedd91463ca5ce8 > @PaulSandoz Could you please run this through Oracle testing. Tests passed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From ngasson at openjdk.java.net Tue Nov 23 02:55:09 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Tue, 23 Nov 2021 02:55:09 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend In-Reply-To: <-cKJ36p7ENbWiSZrA6e1MdTWT8wnVEvQ4njxZwbJ-Mw=.43615cd9-907e-4142-9dec-9b337ef9b754@github.com> References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> <-cKJ36p7ENbWiSZrA6e1MdTWT8wnVEvQ4njxZwbJ-Mw=.43615cd9-907e-4142-9dec-9b337ef9b754@github.com> Message-ID: On Mon, 22 Nov 2021 13:08:54 GMT, Andrew Haley wrote: > We surely need a reproducer for this one. For this to cause a problem we'd need C1 to generate an unaligned load/store with a constant offset. I don't think that can happen in current mainline JDK (or else we would have already seen failures). However `addr->scale()` is always zero when the offset is a constant so the current code isn't functioning as intended. The valhalla branch below asserts with "Field too big for insn" due to this when you run the hotspot_valhalla jtreg group (commit e903390): https://github.com/fparain/valhalla/tree/c1_cleanup ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From sviswanathan at openjdk.java.net Tue Nov 23 05:15:10 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 23 Nov 2021 05:15:10 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Fri, 19 Nov 2021 11:43:16 GMT, Mai ??ng Qu?n Anh wrote: >> Hi, >> >> This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. >> >> The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into vectorMaskReduction > - reduce some dependencies with spare register > - improve mask reduction logic on AVX Marked as reviewed by sviswanathan (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From sviswanathan at openjdk.java.net Tue Nov 23 05:15:11 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 23 Nov 2021 05:15:11 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> <6Quq9p9Qh1T1-77IKdhUjS69PRXmSlEYjNCVSwJ9Y8o=.944179d8-0237-48e0-8494-63d2ac096e5b@github.com> Message-ID: <2EBQHu7_CgxC2S1VIj0ci2-6RTd24WBQZ1_CzIVnXck=.9adf6531-4383-4295-85d0-e9ea2b10a176@github.com> On Tue, 23 Nov 2021 02:37:49 GMT, Paul Sandoz wrote: >> @merykitty does this PR still disable the operations on Neon? >> >> Answering myself, i think not from looking at the changes in the panama repo version: https://github.com/openjdk/panama-vector/pull/158/commits/9b578a3a59d577ac95b50b485bedd91463ca5ce8 > >> @PaulSandoz Could you please run this through Oracle testing. > > Tests passed. Thanks a lot @PaulSandoz for the testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From thartmann at openjdk.java.net Tue Nov 23 07:38:08 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 23 Nov 2021 07:38:08 GMT Subject: RFR: 8275330: C2: assert(n->is_Root() || n->is_Region() || n->is_Phi() || n->is_MachMerge() || def_block->dominates(block)) failed: uses must be dominated by definitions In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 12:16:32 GMT, Roland Westrelin wrote: > This is similar to previous bugs where: > > - a cast/conv node captures a narrow type in a loop body because of a > range check, > > - the range check is optimized out of the loop, pre/main/post loop are > created > > - overunrolling causes the main loop to become unreachable (the range > check, if still in the main loop, would fail), the cast transforms to > top but c2 can't optimize the loop out > > This was fixed by adding predicates above the main loop. With this > particular bug, the cast node is in the post loop. The fix I propose > is to also add predicates above the post loop. There are a few > locations in the code that cause a post loop to be added: either the > initial post loop or some other post loops for vectorization > support. I think the new predicates are needed in a all cases. To be > able to add predicates at these different points in the optimization > process, the new predicates are copied from the main loop predicates. > > I also delayed folding of Opaque4 nodes to macro expansion rather than > post loop opts igvn. The reason for that is that I believe there's a > risk that an Opaque4 is removed (that is replaced by its input 2) > before its input 1 has a chance to constant fold. That wouldn't happen > with a debug build because we leave the tests in (that is replace the > Opaque4 node by its input 1) so that corner case is not properly > tested currently. The reason for leaving the tests in was to sanity > check that the tests are indeed correct. That looks good to me. src/hotspot/share/opto/loopTransform.cpp line 1997: > 1995: post_loop, prev_proj); > 1996: assert(!skeleton_predicate_has_opaque(prev_proj->in(0)->as_If()), "unexpected"); > 1997: } Intendation is wrong. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6429 From thartmann at openjdk.java.net Tue Nov 23 07:39:13 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 23 Nov 2021 07:39:13 GMT Subject: RFR: 8277441: CompileQueue::add fails with assert(_last->next() == __null) failed: not last In-Reply-To: <8B-fG4oCz3vVuq2qJHCTLNeJdvZj_YCInjHSsx4P7bQ=.c1216330-9179-4103-8fa5-aff03a3aa5cf@github.com> References: <8B-fG4oCz3vVuq2qJHCTLNeJdvZj_YCInjHSsx4P7bQ=.c1216330-9179-4103-8fa5-aff03a3aa5cf@github.com> Message-ID: On Mon, 22 Nov 2021 13:57:21 GMT, Tobias Hartmann wrote: > In the rare case that the compiler threads fail during initialization or the code cache is full and flushing is disabled, we completely disable JIT compilation and shut down the compiler runtime: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1813-L1817 > > In the process, we free all compiler queues and set `CompileQueue::_first` to `NULL` and put the `CompileTasks` on the free list: https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileTask.cpp#L84 > > In rare cases, although compilation is disabled, another waiting thread might still call `CompileQueue::add`. That code then fails because `_last != NULL` and `_last->next()` is set to `_task_free_list`. > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L362-L369 > > The fix is to set `_last` to `NULL` in `CompileQueue::free_all`. Adding to the compile queue then succeeds which is harmless because queues have only been freed to make the compiler threads exit faster: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1823 > > The test that triggered this will be added with [PR 6364](https://github.com/openjdk/jdk/pull/6364). I verified that it now passes. > > Thanks, > Tobias Thanks for the review, Nils! ------------- PR: https://git.openjdk.java.net/jdk/pull/6503 From thartmann at openjdk.java.net Tue Nov 23 07:56:16 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 23 Nov 2021 07:56:16 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v3] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 09:55:25 GMT, Christian Hagedorn wrote: >> In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 >> ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) >> >> In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 >> >> During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 >> >> But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. >> >> I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. >> >> I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. >> >> I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - New fix > - Merge branch 'master' into JDK-8275326 > - handle GVN > - C2: assert(no_dead_loop) failed: dead loop detected That looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6276 From pli at openjdk.java.net Tue Nov 23 08:12:07 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Tue, 23 Nov 2021 08:12:07 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: <82Kgtn4RllwF2ifvmwtaQaeG9ADXeUoq290BKnd8PZ4=.ed410c36-2f5c-4b29-9d96-07d33ac872ee@github.com> References: <82Kgtn4RllwF2ifvmwtaQaeG9ADXeUoq290BKnd8PZ4=.ed410c36-2f5c-4b29-9d96-07d33ac872ee@github.com> Message-ID: <0NpXDvx0PPQgOnuxjlDayD-n5Y9nojMQhPRul1ysKqk=.b4e4fdc9-10bc-485f-843e-22c4cc360647@github.com> On Fri, 19 Nov 2021 08:07:13 GMT, Jatin Bhateja wrote: >> The x86 failure is caused by a recent commit (see [JDK-8277324](https://bugs.openjdk.java.net/browse/JDK-8277324)) and unrelated to this PR. > > Hi @pfustc , common type system changes looks good to me. Thank you for looking at my PR. This C2 technique was originally developed by @jatin-bhateja from Intel to optimize small-sized memory copy with x86 AVX-512 masked vector instructions. Now I propose to enable it on AArch64 with SVE. Yes, it has benefit only if the copy size is less than the size of a vector. It's 512 bits on x86, but on AArch64 SVE the max copy size it can benefit depends on the hardware's implementation of the scalable vector register (from 128 bits to 2048 bits). @theRealAph , do you approve this PR? or any specific feedback or suggestion? ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From thartmann at openjdk.java.net Tue Nov 23 08:35:09 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 23 Nov 2021 08:35:09 GMT Subject: RFR: 8277508: need to check has_predicated_vectors before calling scalable_predicate_reg_slots [v2] In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 01:15:30 GMT, Yadong Wang wrote: >> Hi, Team, >> A separate set of predicate registers is not mandatory for an implementation of scalable vectors. It will cause a failure in some platform which supports scalable vectors without explicit predicated registers, like riscv. All code about RegVectMask should be covered by has_predicated_vectors here in Matcher::init_first_stack_mask(). >> >> Yadong > > Yadong Wang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6492 From mdoerr at openjdk.java.net Tue Nov 23 09:32:08 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 23 Nov 2021 09:32:08 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v10] In-Reply-To: <3DyX38fUwXmYfYuInLP-xhm1toijhtr2U7pHK2zhNqU=.b91e17bd-bea6-4323-96e0-03c59e3f0573@github.com> References: <3DyX38fUwXmYfYuInLP-xhm1toijhtr2U7pHK2zhNqU=.b91e17bd-bea6-4323-96e0-03c59e3f0573@github.com> Message-ID: On Thu, 18 Nov 2021 10:21:01 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Fix jit/t/t105/t105.java to also use -XX:-OptimizeImplicitExceptions in addition to -XX:-OmitStacktracesInFastThrow > - Fix IR Framework test Traps::classCheck() which now behaves differently with -XX:+OptimizeImplicitExceptions > - Fix build issue for minimal/zero build one more time > - Minor enhancements and fixes requested by Martin > - Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. > - Fix build issue for minimal/zero build > - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters > - Fix special case where we're creating an implicit exception for a regular invoke* bytecode > - Minor updates as requested by @TheRealMDoerr > - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow I think this workaround is ok. C2 currently doesn't support extended exception messages other than NullPointerExceptions. If this change gets accepted, I think we should add C2 support for other primitive Exceptions. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From chagedorn at openjdk.java.net Tue Nov 23 10:06:22 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 23 Nov 2021 10:06:22 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v3] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 09:55:25 GMT, Christian Hagedorn wrote: >> In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 >> ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) >> >> In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 >> >> During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 >> >> But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. >> >> I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. >> >> I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. >> >> I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - New fix > - Merge branch 'master' into JDK-8275326 > - handle GVN > - C2: assert(no_dead_loop) failed: dead loop detected Thanks Tobias for reviewing it again! ------------- PR: https://git.openjdk.java.net/jdk/pull/6276 From aph at openjdk.java.net Tue Nov 23 10:31:07 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 23 Nov 2021 10:31:07 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> <-cKJ36p7ENbWiSZrA6e1MdTWT8wnVEvQ4njxZwbJ-Mw=.43615cd9-907e-4142-9dec-9b337ef9b754@github.com> Message-ID: On Tue, 23 Nov 2021 02:52:09 GMT, Nick Gasson wrote: > > The valhalla branch below asserts with "Field too big for insn" due to this when you run the hotspot_valhalla jtreg group (commit e903390): > > https://github.com/fparain/valhalla/tree/c1_cleanup Thanks. I'll try that. ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From duke at openjdk.java.net Tue Nov 23 11:00:26 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Tue, 23 Nov 2021 11:00:26 GMT Subject: RFR: JDK-8277562 Remove dead method c1 If::swap_sux Message-ID: swap_sux in c1 is never used or referenced. Let's remove it. This will facilitate further refactorings. ------------- Commit messages: - Remove dead code swap_sux Changes: https://git.openjdk.java.net/jdk/pull/6517/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6517&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277562 Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6517.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6517/head:pull/6517 PR: https://git.openjdk.java.net/jdk/pull/6517 From thartmann at openjdk.java.net Tue Nov 23 11:24:07 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 23 Nov 2021 11:24:07 GMT Subject: RFR: JDK-8277562 Remove dead method c1 If::swap_sux In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 10:51:54 GMT, Ludvig Janiuk wrote: > swap_sux in c1 is never used or referenced. Let's remove it. This will facilitate further refactorings. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6517 From chagedorn at openjdk.java.net Tue Nov 23 11:50:10 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 23 Nov 2021 11:50:10 GMT Subject: RFR: 8275330: C2: assert(n->is_Root() || n->is_Region() || n->is_Phi() || n->is_MachMerge() || def_block->dominates(block)) failed: uses must be dominated by definitions In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 12:16:32 GMT, Roland Westrelin wrote: > This is similar to previous bugs where: > > - a cast/conv node captures a narrow type in a loop body because of a > range check, > > - the range check is optimized out of the loop, pre/main/post loop are > created > > - overunrolling causes the main loop to become unreachable (the range > check, if still in the main loop, would fail), the cast transforms to > top but c2 can't optimize the loop out > > This was fixed by adding predicates above the main loop. With this > particular bug, the cast node is in the post loop. The fix I propose > is to also add predicates above the post loop. There are a few > locations in the code that cause a post loop to be added: either the > initial post loop or some other post loops for vectorization > support. I think the new predicates are needed in a all cases. To be > able to add predicates at these different points in the optimization > process, the new predicates are copied from the main loop predicates. > > I also delayed folding of Opaque4 nodes to macro expansion rather than > post loop opts igvn. The reason for that is that I believe there's a > risk that an Opaque4 is removed (that is replaced by its input 2) > before its input 1 has a chance to constant fold. That wouldn't happen > with a debug build because we leave the tests in (that is replace the > Opaque4 node by its input 1) so that corner case is not properly > tested currently. The reason for leaving the tests in was to sanity > check that the tests are indeed correct. Otherwise, the fix looks reasonable to me! When I was fixing related bugs before, I found myself wondering if the post loop does not need these predicates as well - turns out now it does. src/hotspot/share/opto/loopTransform.cpp line 1549: > 1547: CountedLoopNode *post_head = NULL; > 1548: Node* post_incr = incr; > 1549: Node *main_exit = insert_post_loop(loop, old_new, main_head, main_end, post_incr, limit, post_head); Could also be updated: `Node *main_exit` -> `Node* main_exit`. src/hotspot/share/opto/loopTransform.cpp line 1923: > 1921: Node* castii = cast_incr_before_loop(zer_opaq->in(1), zer_taken, post_head); > 1922: assert(castii != NULL, "no castII inserted"); > 1923: incr = castii; You could directly assign `incr` on L1921 instead of using the additional `castii` variable. src/hotspot/share/opto/loopTransform.cpp line 1978: > 1976: } > 1977: > 1978: void PhaseIdealLoop::insert_post_loop_skeleton_predicates(LoopNode* main_loop_head, CountedLoopNode* post_loop_head, Node* init, Node* stride) { Maybe this could be renamed to `copy_skeleton_predicates_to_post_loop()` to be consistent with the method `copy_skeleton_predicates_to_main_loop()`? ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6429 From duke at openjdk.java.net Tue Nov 23 12:02:14 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Tue, 23 Nov 2021 12:02:14 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> <6Quq9p9Qh1T1-77IKdhUjS69PRXmSlEYjNCVSwJ9Y8o=.944179d8-0237-48e0-8494-63d2ac096e5b@github.com> Message-ID: On Tue, 23 Nov 2021 02:37:49 GMT, Paul Sandoz wrote: >> @merykitty does this PR still disable the operations on Neon? >> >> Answering myself, i think not from looking at the changes in the panama repo version: https://github.com/openjdk/panama-vector/pull/158/commits/9b578a3a59d577ac95b50b485bedd91463ca5ce8 > >> @PaulSandoz Could you please run this through Oracle testing. > > Tests passed. @PaulSandoz @sviswa7 Thank you very much for the reviews and testing. Could I get this PR sponsored, pleased? ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From yadongwang at openjdk.java.net Tue Nov 23 12:17:07 2021 From: yadongwang at openjdk.java.net (Yadong Wang) Date: Tue, 23 Nov 2021 12:17:07 GMT Subject: RFR: 8277508: need to check has_predicated_vectors before calling scalable_predicate_reg_slots [v2] In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 01:15:30 GMT, Yadong Wang wrote: >> Hi, Team, >> A separate set of predicate registers is not mandatory for an implementation of scalable vectors. It will cause a failure in some platform which supports scalable vectors without explicit predicated registers, like riscv. All code about RegVectMask should be covered by has_predicated_vectors here in Matcher::init_first_stack_mask(). >> >> Yadong > > Yadong Wang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8277508: need to check has_predicated_vectors before calling scalable_predicate_reg_slots Thanks. Could I have another (R)eviewer? ------------- PR: https://git.openjdk.java.net/jdk/pull/6492 From thartmann at openjdk.java.net Tue Nov 23 12:49:13 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 23 Nov 2021 12:49:13 GMT Subject: Integrated: 8277441: CompileQueue::add fails with assert(_last->next() == __null) failed: not last In-Reply-To: <8B-fG4oCz3vVuq2qJHCTLNeJdvZj_YCInjHSsx4P7bQ=.c1216330-9179-4103-8fa5-aff03a3aa5cf@github.com> References: <8B-fG4oCz3vVuq2qJHCTLNeJdvZj_YCInjHSsx4P7bQ=.c1216330-9179-4103-8fa5-aff03a3aa5cf@github.com> Message-ID: <5N0PLvEPekJ0Q2xslUBMeGznzB5clWMyuGIithK4Tq0=.5370fba7-b7e8-4fe5-bf70-3af1ad9de7cc@github.com> On Mon, 22 Nov 2021 13:57:21 GMT, Tobias Hartmann wrote: > In the rare case that the compiler threads fail during initialization or the code cache is full and flushing is disabled, we completely disable JIT compilation and shut down the compiler runtime: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1813-L1817 > > In the process, we free all compiler queues and set `CompileQueue::_first` to `NULL` and put the `CompileTasks` on the free list: https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileTask.cpp#L84 > > In rare cases, although compilation is disabled, another waiting thread might still call `CompileQueue::add`. That code then fails because `_last != NULL` and `_last->next()` is set to `_task_free_list`. > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L362-L369 > > The fix is to set `_last` to `NULL` in `CompileQueue::free_all`. Adding to the compile queue then succeeds which is harmless because queues have only been freed to make the compiler threads exit faster: > https://github.com/openjdk/jdk/blob/2f4b5405f0b53782f3ed5274f68b31eb968efb6d/src/hotspot/share/compiler/compileBroker.cpp#L1823 > > The test that triggered this will be added with [PR 6364](https://github.com/openjdk/jdk/pull/6364). I verified that it now passes. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 90f96fb4 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/90f96fb4db174e50cc2510f292fe69fc995add26 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8277441: CompileQueue::add fails with assert(_last->next() == __null) failed: not last Reviewed-by: chagedorn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/6503 From thartmann at openjdk.java.net Tue Nov 23 12:54:17 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 23 Nov 2021 12:54:17 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v2] In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 14:02:18 GMT, Tobias Hartmann wrote: >> I see. I added the bug ID to this test for JDK-8277213. >> I will probably add JDK-8277441 currently being discussed. > > Thanks, yes, please add 8277441 as well. I just pushed JDK-8277441 and verified that the test now always passed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6364 From duke at openjdk.java.net Tue Nov 23 14:35:41 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 23 Nov 2021 14:35:41 GMT Subject: RFR: 8277503: compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java failed with "OnSpinWaitInst with the expected value isb not found." Message-ID: OnSpinWaitInst/OnSpinWaitInstCount are diagnostic options. 'XX:+PrintFlagsFinal' does not print diagnostic options in release builds which causes the test failure. The fix is to use '-XX:+UnlockDiagnosticVMOptions'. Testing: release and fastdebug builds - `make test TEST=hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java`: Passed ------------- Commit messages: - 8277503: compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java failed with "OnSpinWaitInst with the expected value isb not found." Changes: https://git.openjdk.java.net/jdk/pull/6521/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6521&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277503 Stats: 4 lines in 2 files changed: 1 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6521.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6521/head:pull/6521 PR: https://git.openjdk.java.net/jdk/pull/6521 From chagedorn at openjdk.java.net Tue Nov 23 15:09:09 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 23 Nov 2021 15:09:09 GMT Subject: RFR: 8277503: compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java failed with "OnSpinWaitInst with the expected value isb not found." In-Reply-To: References: Message-ID: <93tgdqTBDJJBDu9SphxsfFeB8I0UGYVXRTlGZR1bhXc=.b455befb-9635-43d4-90c7-c246043f3ab3@github.com> On Tue, 23 Nov 2021 14:23:20 GMT, Evgeny Astigeevich wrote: > OnSpinWaitInst/OnSpinWaitInstCount are diagnostic options. > 'XX:+PrintFlagsFinal' does not print diagnostic options in release builds which causes the test failure. > The fix is to use '-XX:+UnlockDiagnosticVMOptions'. > > Testing: release and fastdebug builds > - `make test TEST=hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java`: Passed Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6521 From chagedorn at openjdk.java.net Tue Nov 23 15:25:28 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 23 Nov 2021 15:25:28 GMT Subject: Integrated: 8254108: ciReplay: Support incremental inlining In-Reply-To: References: Message-ID: <8in5XsdEQAHKzOFYhHbgaS4V5i-E5wrtxLwEEijWmPI=.31f9061e-a586-4b49-85ca-a2cb3be93f60@github.com> On Tue, 16 Nov 2021 15:45:15 GMT, Christian Hagedorn wrote: > This patch adds support to explicitly apply incremental inlining when replay compiling a method if the original compilation of the method was also incrementally inlined. We write a new value when dumping the inline tree to indicate if an inlinee was incrementally inlined (`= 1`) or not (`= 0`). > > To implement this, I updated the `REPLAY_VERSION` to 2 and additionally added a test to verify that old replay file versions are still working. I added some support to modify/remove version numbers of generated replay files in tests. I also refactored the test added by JDK-8275868 to reuse some of the methods. > > Thanks, > Christian This pull request has now been integrated. Changeset: 38802ad5 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/38802ad56a31efc90733cb75ea27f019e2c4f5a4 Stats: 629 lines in 9 files changed: 451 ins; 134 del; 44 mod 8254108: ciReplay: Support incremental inlining Reviewed-by: dlong, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6413 From psandoz at openjdk.java.net Tue Nov 23 15:54:09 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Tue, 23 Nov 2021 15:54:09 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Fri, 19 Nov 2021 11:43:16 GMT, Mai ??ng Qu?n Anh wrote: >> Hi, >> >> This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. >> >> The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into vectorMaskReduction > - reduce some dependencies with spare register > - improve mask reduction logic on AVX This needs another hotspot reviewer to review before integration. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From aph at openjdk.java.net Tue Nov 23 15:59:07 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 23 Nov 2021 15:59:07 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend In-Reply-To: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: On Tue, 2 Nov 2021 14:02:48 GMT, Patric Hedlin wrote: > C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). > > Contributed by Nick Gasson. > > Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). > > Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. I think this patch is too complicated. I suggest: --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp @@ -191,12 +191,10 @@ Address LIR_Assembler::as_Address(LIR_Address* addr, Register tmp) { } } else { intptr_t addr_offset = intptr_t(addr->disp()); - if (Address::offset_ok_for_immed(addr_offset, addr->scale())) - return Address(base, addr_offset, Address::lsl(addr->scale())); - else { - __ mov(tmp, addr_offset); - return Address(base, tmp, Address::lsl(addr->scale())); - } + Address result(base, addr_offset, Address::lsl(addr->scale())); + // NOTE: Does not handle any 16 byte vector access. + const uint type_size = type2aelembytes(addr->type(), true); + return __ legitimize_address(result, type_size, tmp); } return Address(); } ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From duke at openjdk.java.net Tue Nov 23 16:02:12 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 23 Nov 2021 16:02:12 GMT Subject: RFR: 8277503: compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java failed with "OnSpinWaitInst with the expected value 'isb' not found." In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 15:57:24 GMT, Andrew Haley wrote: >> OnSpinWaitInst/OnSpinWaitInstCount are diagnostic options. >> 'XX:+PrintFlagsFinal' does not print diagnostic options in release builds which causes the test failure. >> The fix is to use '-XX:+UnlockDiagnosticVMOptions'. >> >> Testing: release and fastdebug builds >> - `make test TEST=hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java`: Passed > > Marked as reviewed by aph (Reviewer). @theRealAph, @chhagedorn Thank you for reviewing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6521 From aph at openjdk.java.net Tue Nov 23 16:02:12 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 23 Nov 2021 16:02:12 GMT Subject: RFR: 8277503: compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java failed with "OnSpinWaitInst with the expected value 'isb' not found." In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 14:23:20 GMT, Evgeny Astigeevich wrote: > OnSpinWaitInst/OnSpinWaitInstCount are diagnostic options. > 'XX:+PrintFlagsFinal' does not print diagnostic options in release builds which causes the test failure. > The fix is to use '-XX:+UnlockDiagnosticVMOptions'. > > Testing: release and fastdebug builds > - `make test TEST=hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java`: Passed Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6521 From jvernee at openjdk.java.net Tue Nov 23 17:15:19 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 23 Nov 2021 17:15:19 GMT Subject: RFR: 8277602: Deopt code does not extend the stack enough if the caller is an optimize entry blob Message-ID: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> Deoptimization code does not recreate c2i adapter 'frames'. For compiled callers this means that the stack needs to be adjusted manually to make room for the parameters when the callee is converted to an interpreter frame (essentially emulating what a c2i adapter would do). To check if the caller does a compiled call, the current code uses `frame::is_compiled_frame()`, which is true if the codeblob of the caller frame is an instance of `CompiledMethod`. However, optimized entry blobs also do compiled calls, are not detected by this test, and therefore don't get their stack adjusted correctly. To address this, I've added a new `frame::is_compiled_caller` function to determine if the caller is doing a compiled call, and I use that in the deopt code instead of `is_compiled_frame`. This patch also removes an old workaround that tried to fix the issue by allocating some spill space in the optimized entry blob frame, but this only accounts for the first argument. If there are more arguments we still have a problem. The suggested patch fixes this the right way I think. Thanks, Jorn Testing: run-test-jdk_foreign on Windows x64 and Linux x64 (afaik these are the only tests that use optimized entry blobs) ------------- Commit messages: - Properly handle optimized entry frame callers during deopt Changes: https://git.openjdk.java.net/jdk/pull/6522/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6522&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277602 Stats: 13 lines in 3 files changed: 5 ins; 6 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6522.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6522/head:pull/6522 PR: https://git.openjdk.java.net/jdk/pull/6522 From jvernee at openjdk.java.net Tue Nov 23 17:15:19 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 23 Nov 2021 17:15:19 GMT Subject: RFR: 8277602: Deopt code does not extend the stack enough if the caller is an optimize entry blob In-Reply-To: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> References: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> Message-ID: <683IqzGnLCvQzyJ4cB-CvJPMM2eXUEUrT2JibG1i2Qc=.012e73ee-eee9-4802-9c23-546ec25c51da@github.com> On Tue, 23 Nov 2021 15:01:24 GMT, Jorn Vernee wrote: > Deoptimization code does not recreate c2i adapter 'frames'. For compiled callers this means that the stack needs to be adjusted manually to make room for the parameters when the callee is converted to an interpreter frame (essentially emulating what a c2i adapter would do). > > To check if the caller does a compiled call, the current code uses `frame::is_compiled_frame()`, which is true if the codeblob of the caller frame is an instance of `CompiledMethod`. > > However, optimized entry blobs also do compiled calls, are not detected by this test, and therefore don't get their stack adjusted correctly. > > To address this, I've added a new `frame::is_compiled_caller` function to determine if the caller is doing a compiled call, and I use that in the deopt code instead of `is_compiled_frame`. > > This patch also removes an old workaround that tried to fix the issue by allocating some spill space in the optimized entry blob frame, but this only accounts for the first argument. If there are more arguments we still have a problem. The suggested patch fixes this the right way I think. > > Thanks, > Jorn > > Testing: run-test-jdk_foreign on Windows x64 and Linux x64 (afaik these are the only tests that use optimized entry blobs) I've CC'd hotspot-compiler on this since compiler folks are probably the most familiar with deoptimization code. ------------- PR: https://git.openjdk.java.net/jdk/pull/6522 From phedlin at openjdk.java.net Tue Nov 23 18:44:06 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 23 Nov 2021 18:44:06 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend In-Reply-To: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: On Tue, 2 Nov 2021 14:02:48 GMT, Patric Hedlin wrote: > C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). > > Contributed by Nick Gasson. > > Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). > > Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. Besides the use of addr->scale(), using legitimize_address() is of course cleaner (and adds block comments to the assembly). ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From phh at openjdk.java.net Tue Nov 23 20:09:04 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Tue, 23 Nov 2021 20:09:04 GMT Subject: RFR: 8277503: compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java failed with "OnSpinWaitInst with the expected value 'isb' not found." In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 14:23:20 GMT, Evgeny Astigeevich wrote: > OnSpinWaitInst/OnSpinWaitInstCount are diagnostic options. > 'XX:+PrintFlagsFinal' does not print diagnostic options in release builds which causes the test failure. > The fix is to use '-XX:+UnlockDiagnosticVMOptions'. > > Testing: release and fastdebug builds > - `make test TEST=hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java`: Passed Marked as reviewed by phh (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6521 From duke at openjdk.java.net Tue Nov 23 20:09:05 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 23 Nov 2021 20:09:05 GMT Subject: Integrated: 8277503: compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java failed with "OnSpinWaitInst with the expected value 'isb' not found." In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 14:23:20 GMT, Evgeny Astigeevich wrote: > OnSpinWaitInst/OnSpinWaitInstCount are diagnostic options. > 'XX:+PrintFlagsFinal' does not print diagnostic options in release builds which causes the test failure. > The fix is to use '-XX:+UnlockDiagnosticVMOptions'. > > Testing: release and fastdebug builds > - `make test TEST=hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java`: Passed This pull request has now been integrated. Changeset: 7b2d823e Author: Evgeny Astigeevich Committer: Paul Hohensee URL: https://git.openjdk.java.net/jdk/commit/7b2d823e842e6a66dbe46b048da44ca9e5485c75 Stats: 4 lines in 2 files changed: 1 ins; 2 del; 1 mod 8277503: compiler/onSpinWait/TestOnSpinWaitAArch64DefaultFlags.java failed with "OnSpinWaitInst with the expected value 'isb' not found." Reviewed-by: chagedorn, aph, phh ------------- PR: https://git.openjdk.java.net/jdk/pull/6521 From rkennke at openjdk.java.net Tue Nov 23 21:02:32 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 23 Nov 2021 21:02:32 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v5] In-Reply-To: References: Message-ID: <92eQ9tf37yQfrEQX2iWlFSMxHjZAv8lDUrgb9CJ0NeE=.93f43cec-9f54-49e3-8823-4356e1021d8b@github.com> > The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. > I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. > > The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [ ] tier4 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Make flag deprecation product-only; Add flag to VMDeprecatedOptions test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6320/files - new: https://git.openjdk.java.net/jdk/pull/6320/files/818468e7..4a95e033 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=03-04 Stats: 26 lines in 2 files changed: 14 ins; 0 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/6320.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320 PR: https://git.openjdk.java.net/jdk/pull/6320 From dlong at openjdk.java.net Tue Nov 23 21:44:07 2021 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 23 Nov 2021 21:44:07 GMT Subject: RFR: 8277602: Deopt code does not extend the stack enough if the caller is an optimize entry blob In-Reply-To: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> References: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> Message-ID: On Tue, 23 Nov 2021 15:01:24 GMT, Jorn Vernee wrote: > Deoptimization code does not recreate c2i adapter 'frames'. For compiled callers this means that the stack needs to be adjusted manually to make room for the parameters when the callee is converted to an interpreter frame (essentially emulating what a c2i adapter would do). > > To check if the caller does a compiled call, the current code uses `frame::is_compiled_frame()`, which is true if the codeblob of the caller frame is an instance of `CompiledMethod`. > > However, optimized entry blobs also do compiled calls, are not detected by this test, and therefore don't get their stack adjusted correctly. > > To address this, I've added a new `frame::is_compiled_caller` function to determine if the caller is doing a compiled call, and I use that in the deopt code instead of `is_compiled_frame`. > > This patch also removes an old workaround that tried to fix the issue by allocating some spill space in the optimized entry blob frame, but this only accounts for the first argument. If there are more arguments we still have a problem. The suggested patch fixes this the right way I think. > > Thanks, > Jorn > > Testing: run-test-jdk_foreign on Windows x64 and Linux x64 (afaik these are the only tests that use optimized entry blobs) Marked as reviewed by dlong (Reviewer). This seems safe enough, since it should only affect optimized entry blob frames. ------------- PR: https://git.openjdk.java.net/jdk/pull/6522 From jvernee at openjdk.java.net Tue Nov 23 22:24:06 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 23 Nov 2021 22:24:06 GMT Subject: RFR: 8277602: Deopt code does not extend the stack enough if the caller is an optimize entry blob In-Reply-To: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> References: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> Message-ID: <7TLJl3AzPhmU0ZxcfAdz7pJp9knEhikwG-THzRCH32g=.7ca8ebc1-615b-4406-a046-b7fedf531b3b@github.com> On Tue, 23 Nov 2021 15:01:24 GMT, Jorn Vernee wrote: > Deoptimization code does not recreate c2i adapter 'frames'. For compiled callers this means that the stack needs to be adjusted manually to make room for the parameters when the callee is converted to an interpreter frame (essentially emulating what a c2i adapter would do). > > To check if the caller does a compiled call, the current code uses `frame::is_compiled_frame()`, which is true if the codeblob of the caller frame is an instance of `CompiledMethod`. > > However, optimized entry blobs also do compiled calls, are not detected by this test, and therefore don't get their stack adjusted correctly. > > To address this, I've added a new `frame::is_compiled_caller` function to determine if the caller is doing a compiled call, and I use that in the deopt code instead of `is_compiled_frame`. > > This patch also removes an old workaround that tried to fix the issue by allocating some spill space in the optimized entry blob frame, but this only accounts for the first argument. If there are more arguments we still have a problem. The suggested patch fixes this the right way I think. > > Thanks, > Jorn > > Testing: run-test-jdk_foreign on Windows x64 and Linux x64 (afaik these are the only tests that use optimized entry blobs) Thanks for the review. FWIW, we have a test case which is a tomcat benchmark that uses a custom panama-foreign based SSL library. A memory corruption crash occurs in that benchmark, gets more frequent with -XX:+DeoptimizeALot, and goes away with this fix. There's a test that used to also test this deopt code path in the jdk_foreign test suite as well, but the conditions for the failure are quite specific (optimized entry blob needs to be the deoptee's caller), and due to some code changing elsewhere the test seems to no longer hit the offending code path. This doesn't need to be integrated right away (though I would like to get it into 18). In the mean time I'll also see if I can improve the existing test to catch this case again. ------------- PR: https://git.openjdk.java.net/jdk/pull/6522 From sviswanathan at openjdk.java.net Tue Nov 23 22:49:04 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 23 Nov 2021 22:49:04 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Fri, 19 Nov 2021 11:43:16 GMT, Mai ??ng Qu?n Anh wrote: >> Hi, >> >> This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. >> >> The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into vectorMaskReduction > - reduce some dependencies with spare register > - improve mask reduction logic on AVX @DamonFool Could you please review this patch? ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From jiefu at openjdk.java.net Tue Nov 23 23:27:06 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 23 Nov 2021 23:27:06 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Tue, 23 Nov 2021 22:45:51 GMT, Sandhya Viswanathan wrote: > @DamonFool Could you please review this patch? I will do some testing and feedback. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From jiefu at openjdk.java.net Wed Nov 24 03:55:06 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 24 Nov 2021 03:55:06 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: <5iLsLwcfgpd5v0Ns-6jzNDhrNJorbfsV0wvBmB5Gml4=.037dd4a6-dd0f-4385-adbe-05532d913c3d@github.com> On Fri, 19 Nov 2021 11:43:16 GMT, Mai ??ng Qu?n Anh wrote: >> Hi, >> >> This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. >> >> The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into vectorMaskReduction > - reduce some dependencies with spare register > - improve mask reduction logic on AVX src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4065: > 4063: void C2_MacroAssembler::vector_mask_operation(int opc, Register dst, KRegister mask, > 4064: int masklen, int masksize, int vec_enc) { > 4065: assert(VM_Version::supports_popcnt() && New instructions like `lzcntq` and `tzcntq` are used for the optimized code gen without detecting the availability. I'm a bit worried about that. So do all AVX512 platforms support them? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From jiefu at openjdk.java.net Wed Nov 24 04:01:03 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 24 Nov 2021 04:01:03 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Tue, 23 Nov 2021 23:23:46 GMT, Jie Fu wrote: > > @DamonFool Could you please review this patch? > > I will do some testing and feedback. Thanks. All the tests passed. But I'm not sure whether all avx512 machines support bmi1 and lzcnt features. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From xliu at openjdk.java.net Wed Nov 24 07:48:37 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 24 Nov 2021 07:48:37 GMT Subject: RFR: 8274983: C1 optimizes the invocation of private interface methods [v2] In-Reply-To: References: Message-ID: > The root cause of the C1 regression is that some regex generate multiple classes which all implement > an interface. In SlowStartupTest.java, the following **invokeinterface** happens frequently with different receivers. the target is a private interface method. > > > 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z > > > This patch allows c1 to generate the optimized virtual call for invokeinterface > whose targets are the private interface methods. > > Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private > interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, > LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because > it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, > it is possible that they trash the IC stub using their own concrete klass in runtime. > > Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM > 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. > Therefore, this patch can prevent the callsite from trashing. > > Before this patch, SlowStartupTest had 38770 times _resolve_invoke_virtual_cnt and 38695 _handle_wrong_method_cnt per 10k iterations. To dump `C1Statistics`, we use fastdebug build for comparison. > > > $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 > Executed 10000 iterations in 736ms > C1 Runtime statistics: > _resolve_invoke_virtual_cnt: 38770 > _resolve_invoke_opt_virtual_cnt: 186 > _resolve_invoke_static_cnt: 44 > _handle_wrong_method_cnt: 38695 > _ic_miss_cnt: 35 > > > With this patch, only 1 _handle_wrong_method_cnt is triggered but we have 3 more `_resolve_invoke_opt_virtual_cnt` events instead. The total runtime reduces from 736ms to 9ms. > > > $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 > Executed 10000 iterations in 9ms > C1 Runtime statistics: > _resolve_invoke_virtual_cnt: 77 > _resolve_invoke_opt_virtual_cnt: 189 > _resolve_invoke_static_cnt: 45 > _handle_wrong_method_cnt: 1 > _ic_miss_cnt: 39 > > > Codegen wise, before the patch, C1 generates LIR for the invokeinterface whose target is a private interface methoda as follows. > > __bci__use__tid____instr____________________________________ > . 1 0 v2 a1.invokeinterface() > InvokePrivateInterfaceMethod$I.bar()V > . 6 0 v3 return > > > With this patch, C1 generates LIR as follows. it first check a1 is a subtype of `InvokePrivateInterfaceMethod$I`. if so, an optimized virtual call is generated. The callsite will be fixed up once and only one time in runtime. > > __bci__use__tid____instr____________________________________ > . 1 1 a2 checkcast(a1) InvokePrivateInterfaceMethod$I > stack [0:a1] > . 1 0 v3 a2.invokeinterface() > InvokePrivateInterfaceMethod$I.bar()V > . 6 0 v4 return Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - 8274983: C1 optimizes the invocation of private interface methods - Merge branch 'master' into JDK-8274983 - 8274983: Pattern.matcher performance regression after JDK-823835 This patch allows c1 to generate the optimized virtual call for invokeinterface whose targets are the private interface methods. Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, it is possible that they trash the IC stub using their own concrete klass in runtime. Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. Therefore, this patch can prevent the callsite from trashing. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6445/files - new: https://git.openjdk.java.net/jdk/pull/6445/files/5a00e1f7..acf7b9f8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6445&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6445&range=00-01 Stats: 11800 lines in 386 files changed: 9368 ins; 1058 del; 1374 mod Patch: https://git.openjdk.java.net/jdk/pull/6445.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6445/head:pull/6445 PR: https://git.openjdk.java.net/jdk/pull/6445 From xliu at openjdk.java.net Wed Nov 24 07:55:02 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 24 Nov 2021 07:55:02 GMT Subject: RFR: 8274983: C1 optimizes the invocation of private interface methods In-Reply-To: References: Message-ID: <1O62Kt_KNQBtpCUP1ulJ5OC6HpQgW9xIYKe1CBWV7tk=.8558db6b-fc6f-4255-a659-121cdc5c4e25@github.com> On Sat, 20 Nov 2021 10:49:12 GMT, Dean Long wrote: >> hi, @dean-long, >> >> I think C1 covers all cases as long as the target method is loaded. I have seen cases which target methods haven't been loaded in startup time, but they are rare. >> >> ciMethod::can_be_statically_bound() return true if the method is private or final. The matrix shows the modifiers of target methods. >> >> | | final | private | >> |-----------------|-------|---------| >> | invokevirtual | 1 | 2 | >> | invokespecial | N/A1 | 3 | >> | invokeinterface | N/A2 | 4 | >> >> 1. generates the optimized virtual call because [x->target_is_final()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_LIRGenerator.cpp#L2799) is true. >> 2. transforms to `invokespecial` [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_GraphBuilder.cpp#L1885) , then it will be case 3. >> 3. generates the optimize virtual call because `x->code() == Bytecodes::_invokespecial` is true. >> 4. is what this patch covers. >> >> NA-1. I think it's impossible for javac. it would be an optimized virtual call like case 3 even it existed. >> NA-2: it's an illegal modifier for an interface method. >> https://docs.oracle.com/javase/specs/jls/se17/html/jls-9.html#jls-InterfaceMethodModifier > > Thanks @navyxliu. I wonder if we can do 2) and 3) for invokeinterface, simplying the patch. Something like: > > > // Some methods are obviously bindable without any type checks so > // convert them directly to an invokespecial or invokestatic. > if (target->is_loaded() && !target->is_abstract() && target->can_be_statically_bound()) { > switch (bc_raw) { > case Bytecodes::_invokevirtual: > case Bytecodes::_invokeinterface:: // XXX add invokeinterface here > code = Bytecodes::_invokespecial; > break; > > [...] > > // invoke-special-super > if (code == Bytecodes::_invokespecial && !target->is_object_initializer()) { // XXX use "code" here > ciInstanceKlass* sender_klass = calling_klass; > if (sender_klass->is_interface()) { > > [...] > > What do you think? hi, @dean-long , I update the PR. The new revision generate Invoke HIR with code == Invokespecial. I don't need to touch the LIR part, so I revert them. I call `c->set_incompatible_class_change_check()` instead of set_invokespecial_receiver_check(). this will call a runtime to throw ICCError directly. Before that, it went to uncommon_trap and reinterpreted the bytecode. ------------- PR: https://git.openjdk.java.net/jdk/pull/6445 From jiefu at openjdk.java.net Wed Nov 24 09:01:25 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 24 Nov 2021 09:01:25 GMT Subject: RFR: 8277753: Long*VectorTests.java fail with "bad AD file" on x86_32 Message-ID: Hi all, The following vector api tests fail with "bad AD file" on x86_32. jdk/incubator/vector/Long128VectorTests.java jdk/incubator/vector/Long256VectorTests.java jdk/incubator/vector/Long512VectorTests.java jdk/incubator/vector/LongMaxVectorTests.java jdk/incubator/vector/Long64VectorTests.java Let's fix it. Thanks. Best regards, Jie ------------- Commit messages: - x86_32: multiple fastdebug failures with "bad AD file" Changes: https://git.openjdk.java.net/jdk/pull/6533/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6533&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277753 Stats: 30 lines in 1 file changed: 30 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6533.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6533/head:pull/6533 PR: https://git.openjdk.java.net/jdk/pull/6533 From ngasson at openjdk.java.net Wed Nov 24 10:16:07 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 24 Nov 2021 10:16:07 GMT Subject: RFR: 8277411: C2 fast_unlock intrinsic on AArch64 has unnecessary ownership check In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 11:04:37 GMT, Erik ?sterlund wrote: > The AArch64 fast_unlock C2 code checks if the current thread owns the lock. This can be surprisingly expensive in workload where locking is contended. The check is however optional (helpful only for finding JNI code bugs), and indeed not emitted for x86_64. This patch removes the check on AArch64 as well. Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6498 From dlong at openjdk.java.net Wed Nov 24 10:22:10 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 24 Nov 2021 10:22:10 GMT Subject: RFR: 8274983: C1 optimizes the invocation of private interface methods [v2] In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 07:48:37 GMT, Xin Liu wrote: >> The root cause of the C1 regression is that some regex generate multiple classes which all implement >> an interface. In SlowStartupTest.java, the following **invokeinterface** happens frequently with different receivers. the target is a private interface method. >> >> >> 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z >> >> >> This patch allows c1 to generate the optimized virtual call for invokeinterface >> whose targets are the private interface methods. >> >> Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private >> interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, >> LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because >> it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, >> it is possible that they trash the IC stub using their own concrete klass in runtime. >> >> Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM >> 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. >> Therefore, this patch can prevent the callsite from trashing. >> >> Before this patch, SlowStartupTest had 38770 times _resolve_invoke_virtual_cnt and 38695 _handle_wrong_method_cnt per 10k iterations. To dump `C1Statistics`, we use fastdebug build for comparison. >> >> >> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 >> Executed 10000 iterations in 736ms >> C1 Runtime statistics: >> _resolve_invoke_virtual_cnt: 38770 >> _resolve_invoke_opt_virtual_cnt: 186 >> _resolve_invoke_static_cnt: 44 >> _handle_wrong_method_cnt: 38695 >> _ic_miss_cnt: 35 >> >> >> With this patch, only 1 _handle_wrong_method_cnt is triggered but we have 3 more `_resolve_invoke_opt_virtual_cnt` events instead. The total runtime reduces from 736ms to 9ms. >> >> >> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 >> Executed 10000 iterations in 9ms >> C1 Runtime statistics: >> _resolve_invoke_virtual_cnt: 77 >> _resolve_invoke_opt_virtual_cnt: 189 >> _resolve_invoke_static_cnt: 45 >> _handle_wrong_method_cnt: 1 >> _ic_miss_cnt: 39 >> >> >> Codegen wise, before the patch, C1 generates LIR for the invokeinterface whose target is a private interface methoda as follows. >> >> __bci__use__tid____instr____________________________________ >> . 1 0 v2 a1.invokeinterface() >> InvokePrivateInterfaceMethod$I.bar()V >> . 6 0 v3 return >> >> >> With this patch, C1 generates LIR as follows. it first check a1 is a subtype of `InvokePrivateInterfaceMethod$I`. if so, an optimized virtual call is generated. The callsite will be fixed up once and only one time in runtime. >> >> __bci__use__tid____instr____________________________________ >> . 1 1 a2 checkcast(a1) InvokePrivateInterfaceMethod$I >> stack [0:a1] >> . 1 0 v3 a2.invokeinterface() >> InvokePrivateInterfaceMethod$I.bar()V >> . 6 0 v4 return > > Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - 8274983: C1 optimizes the invocation of private interface methods > - Merge branch 'master' into JDK-8274983 > - 8274983: Pattern.matcher performance regression after JDK-823835 > > This patch allows c1 to generate the optimized virtual call for invokeinterface > whose targets are the private interface methods. > > Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private > interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, > LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because > it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, > it is possible that they trash the IC stub using their own concrete klass in runtime. > > Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM > 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. > Therefore, this patch can prevent the callsite from trashing. Changes requested by dlong (Reviewer). src/hotspot/share/c1/c1_GraphBuilder.cpp line 1902: > 1900: Value receiver = state()->stack_at(index); > 1901: CheckCast* c = new CheckCast(receiver_constraint, receiver, copy_state_before()); > 1902: c->set_incompatible_class_change_check(); set_incompatible_class_change_check() seems OK for invokeinterface, but for invokespecial, I believe the interpreter throws IllegalAccessError. ------------- PR: https://git.openjdk.java.net/jdk/pull/6445 From duke at openjdk.java.net Wed Nov 24 11:06:10 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Wed, 24 Nov 2021 11:06:10 GMT Subject: Integrated: JDK-8277562 Remove dead method c1 If::swap_sux In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 10:51:54 GMT, Ludvig Janiuk wrote: > swap_sux in c1 is never used or referenced. Let's remove it. This will facilitate further refactorings. This pull request has now been integrated. Changeset: 8a8bc29f Author: Ludvig Janiuk Committer: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/8a8bc29f203fa4aaa29303a778fd388e32ca651a Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod 8277562: Remove dead method c1 If::swap_sux Reviewed-by: thartmann, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/6517 From neliasso at openjdk.java.net Wed Nov 24 11:06:09 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 24 Nov 2021 11:06:09 GMT Subject: RFR: JDK-8277562 Remove dead method c1 If::swap_sux In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 10:51:54 GMT, Ludvig Janiuk wrote: > swap_sux in c1 is never used or referenced. Let's remove it. This will facilitate further refactorings. Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6517 From duke at openjdk.java.net Wed Nov 24 11:14:45 2021 From: duke at openjdk.java.net (Takuya Kiriyama) Date: Wed, 24 Nov 2021 11:14:45 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v4] In-Reply-To: References: Message-ID: <1vDWHsq_8HSkVX8XMmc7Adjll4dtAvg7wOCfipYtK-g=.450cf159-70bd-4fbd-9d99-4ce573212453@github.com> > Could you please review the 8277042 code? > This is the enhancement for 8276036. > I add a new test to verify the value of full_count in the message of insufficient codecache. Takuya Kiriyama has updated the pull request incrementally with one additional commit since the last revision: 8277042: add test for 8276036 to compiler/codecache ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6364/files - new: https://git.openjdk.java.net/jdk/pull/6364/files/d6352295..bd09c8d7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6364&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6364&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6364.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6364/head:pull/6364 PR: https://git.openjdk.java.net/jdk/pull/6364 From duke at openjdk.java.net Wed Nov 24 11:14:46 2021 From: duke at openjdk.java.net (Takuya Kiriyama) Date: Wed, 24 Nov 2021 11:14:46 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v2] In-Reply-To: References: Message-ID: <6xF6aRmhN1uHq8EpkM_yZJJ2iWby6NEMXwaUw8ClA28=.bdc1afee-80f9-4eea-9c6a-21c254f00771@github.com> On Tue, 23 Nov 2021 12:50:44 GMT, Tobias Hartmann wrote: >> Thanks, yes, please add 8277441 as well. > > I just pushed JDK-8277441 and verified that the test now always passed. Thank you. I'm glad the bug was fixed. I added 8277441 to this test. ------------- PR: https://git.openjdk.java.net/jdk/pull/6364 From thartmann at openjdk.java.net Wed Nov 24 11:26:14 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 24 Nov 2021 11:26:14 GMT Subject: RFR: 8277042: add test for 8276036 to compiler/codecache [v4] In-Reply-To: <1vDWHsq_8HSkVX8XMmc7Adjll4dtAvg7wOCfipYtK-g=.450cf159-70bd-4fbd-9d99-4ce573212453@github.com> References: <1vDWHsq_8HSkVX8XMmc7Adjll4dtAvg7wOCfipYtK-g=.450cf159-70bd-4fbd-9d99-4ce573212453@github.com> Message-ID: On Wed, 24 Nov 2021 11:14:45 GMT, Takuya Kiriyama wrote: >> Could you please review the 8277042 code? >> This is the enhancement for 8276036. >> I add a new test to verify the value of full_count in the message of insufficient codecache. > > Takuya Kiriyama has updated the pull request incrementally with one additional commit since the last revision: > > 8277042: add test for 8276036 to compiler/codecache Thanks, looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6364 From duke at openjdk.java.net Wed Nov 24 11:26:16 2021 From: duke at openjdk.java.net (Takuya Kiriyama) Date: Wed, 24 Nov 2021 11:26:16 GMT Subject: Integrated: 8277042: add test for 8276036 to compiler/codecache In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 08:54:57 GMT, Takuya Kiriyama wrote: > Could you please review the 8277042 code? > This is the enhancement for 8276036. > I add a new test to verify the value of full_count in the message of insufficient codecache. This pull request has now been integrated. Changeset: 17e68caa Author: KIRIYAMA Takuya Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/17e68caad727b04f5e7efde59fce960c66558504 Stats: 129 lines in 1 file changed: 129 ins; 0 del; 0 mod 8277042: add test for 8276036 to compiler/codecache Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6364 From phedlin at openjdk.java.net Wed Nov 24 11:52:45 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Wed, 24 Nov 2021 11:52:45 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend [v2] In-Reply-To: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: > C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). > > Contributed by Nick Gasson. > > Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). > > Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: Clean-up address calculation via use of legitimize_address(). ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6212/files - new: https://git.openjdk.java.net/jdk/pull/6212/files/44d78779..4c48b421 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6212&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6212&range=00-01 Stats: 7 lines in 1 file changed: 0 ins; 6 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6212.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6212/head:pull/6212 PR: https://git.openjdk.java.net/jdk/pull/6212 From jiefu at openjdk.java.net Wed Nov 24 12:04:16 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 24 Nov 2021 12:04:16 GMT Subject: RFR: 8277777: [Vector API] assert(r->is_XMMRegister()) failed: must be in x86_32.ad Message-ID: Hi all, The following vector api tests fail on x86_32/AVX512 with `assert(r->is_XMMRegister()) failed: must be`. jdk/incubator/vector/Byte64VectorLoadStoreTests.java jdk/incubator/vector/Byte256VectorLoadStoreTests.java jdk/incubator/vector/Byte128VectorLoadStoreTests.java jdk/incubator/vector/ByteMaxVectorLoadStoreTests.java jdk/incubator/vector/Double256VectorTests.java jdk/incubator/vector/Double512VectorTests.java jdk/incubator/vector/DoubleMaxVectorTests.java jdk/incubator/vector/Float512VectorTests.java jdk/incubator/vector/Float256VectorTests.java jdk/incubator/vector/FloatMaxVectorTests.java jdk/incubator/vector/Float128VectorTests.java jdk/incubator/vector/Short128VectorLoadStoreTests.java jdk/incubator/vector/Short256VectorLoadStoreTests.java jdk/incubator/vector/Short64VectorLoadStoreTests.java jdk/incubator/vector/ShortMaxVectorLoadStoreTests.java The reason is that `static enum RC rc_class( OptoReg::Name reg )` [1] missed the case for KRegister. And the AVX-512 opmask specific spilling code [2] should be located before the size assert [3]. Thanks. Best regards, Jie [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_32.ad#L747 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_32.ad#L1272 [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_32.ad#L1252 ------------- Commit messages: - 8277777: [Vector API] assert(r->is_XMMRegister()) failed: must be in x86_32.ad Changes: https://git.openjdk.java.net/jdk/pull/6535/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6535&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277777 Stats: 41 lines in 1 file changed: 21 ins; 20 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6535.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6535/head:pull/6535 PR: https://git.openjdk.java.net/jdk/pull/6535 From aph at openjdk.java.net Wed Nov 24 12:25:09 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 24 Nov 2021 12:25:09 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: On Tue, 23 Nov 2021 18:41:00 GMT, Patric Hedlin wrote: > Besides the use of addr->scale(), using legitimize_address() is of course cleaner (and adds block comments to the assembly). Generates slightly different code. I don't understand why the new version, that depends on `addr->scale()` being zero, should be better. Surely having something that works with all values of scale is more robust, or at least no worse, than the suggested change. What's the point of changing this code so that you need `addr->scale()` being zero as a precondition? ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From duke at openjdk.java.net Wed Nov 24 12:50:38 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Wed, 24 Nov 2021 12:50:38 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) [v2] In-Reply-To: References: Message-ID: > Refactor PredecessorValidator, more or less applying the following: > > declare variables where used > redeclare instead of reuse variables > move assert to a more logical place > remove unused length variable > inline variables where senseful > split loops > extract methods > > this is done in preparation for work on optimizing IR::verify. IR::verify calls PredecessorValidator. If the work of PredecessorValidator is made clearer, it will be easier to reason about where IR::verify doesn't need to be called (or where a subset of it would suffice). Ludvig Janiuk has updated the pull request incrementally with one additional commit since the last revision: formatting: type asterisks ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6394/files - new: https://git.openjdk.java.net/jdk/pull/6394/files/71067986..8ed15361 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6394&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6394&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6394.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6394/head:pull/6394 PR: https://git.openjdk.java.net/jdk/pull/6394 From duke at openjdk.java.net Wed Nov 24 12:50:42 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Wed, 24 Nov 2021 12:50:42 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) [v2] In-Reply-To: <8zsyGZ1BGai-zfgnG9K5m_8S3vLQX-S6f-pIOfjyRow=.e5aa7aaf-96c2-47a5-8015-8f50466a982b@github.com> References: <8zsyGZ1BGai-zfgnG9K5m_8S3vLQX-S6f-pIOfjyRow=.e5aa7aaf-96c2-47a5-8015-8f50466a982b@github.com> Message-ID: On Mon, 22 Nov 2021 15:51:02 GMT, Christian Hagedorn wrote: >> Ludvig Janiuk has updated the pull request incrementally with one additional commit since the last revision: >> >> formatting: type asterisks > > src/hotspot/share/c1/c1_IR.cpp line 1303: > >> 1301: >> 1302: private: >> 1303: void verify_successor_xentry_flag(const BlockBegin *block) const { > > For this and other methods below: Asterisk should be at the type: `BlockBegin* block`. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/6394 From duke at openjdk.java.net Wed Nov 24 12:52:32 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Wed, 24 Nov 2021 12:52:32 GMT Subject: RFR: JDK-8277382 make c1 BlockMerger use IR::verify only when necessary [v2] In-Reply-To: References: Message-ID: > This PR removes two calls to `IR::verify` which were unnecessary. The reason they are unnecessary is that `try_merge` does not always take any action. There is not need to verify if nothing has changed. In the cases that `try_merge` does do anything, it already calls `IR::verify` afterwards. > > This PR also switches some deeply nested if statements in `try_merge` to early returns. Ludvig Janiuk has updated the pull request incrementally with one additional commit since the last revision: indentation error fixed ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6456/files - new: https://git.openjdk.java.net/jdk/pull/6456/files/61d7be5e..e17460c6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6456&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6456&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6456.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6456/head:pull/6456 PR: https://git.openjdk.java.net/jdk/pull/6456 From duke at openjdk.java.net Wed Nov 24 12:52:36 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Wed, 24 Nov 2021 12:52:36 GMT Subject: RFR: JDK-8277382 make c1 BlockMerger use IR::verify only when necessary [v2] In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 14:22:22 GMT, Tobias Hartmann wrote: >> Ludvig Janiuk has updated the pull request incrementally with one additional commit since the last revision: >> >> indentation error fixed > > src/hotspot/share/c1/c1_Optimizer.cpp line 374: > >> 372: assert(sux_value == end_state->local_at(index), "locals not equal"); >> 373: } >> 374: assert(sux_state->caller_state() == end_state->caller_state(), "caller not equal"); > > The indentation is wrong. done ------------- PR: https://git.openjdk.java.net/jdk/pull/6456 From neliasso at openjdk.java.net Wed Nov 24 13:24:04 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 24 Nov 2021 13:24:04 GMT Subject: RFR: JDK-8277382 make c1 BlockMerger use IR::verify only when necessary [v2] In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 12:52:32 GMT, Ludvig Janiuk wrote: >> This PR removes two calls to `IR::verify` which were unnecessary. The reason they are unnecessary is that `try_merge` does not always take any action. There is not need to verify if nothing has changed. In the cases that `try_merge` does do anything, it already calls `IR::verify` afterwards. >> >> This PR also switches some deeply nested if statements in `try_merge` to early returns. > > Ludvig Janiuk has updated the pull request incrementally with one additional commit since the last revision: > > indentation error fixed Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6456 From phedlin at openjdk.java.net Wed Nov 24 13:40:06 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Wed, 24 Nov 2021 13:40:06 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend [v2] In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: <7IURH4of06nAIMsMnZjL8057mgJ_fg5T_FHKQ-bxo7o=.bd8da053-2044-4a1f-8884-8722f98ae91e@github.com> On Wed, 24 Nov 2021 11:52:45 GMT, Patric Hedlin wrote: >> C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). >> >> Contributed by Nick Gasson. >> >> Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). >> >> Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. > > Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: > > Clean-up address calculation via use of legitimize_address(). The implementation of 'lea' does not seem to handle scale (for base_plus_offset). Am I missing something? ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From duke at openjdk.java.net Wed Nov 24 14:05:34 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Wed, 24 Nov 2021 14:05:34 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v3] In-Reply-To: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: > Hi, > > This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. > > The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). > > Thank you very much. Mai ??ng Qu?n Anh has updated the pull request incrementally with one additional commit since the last revision: add check bmi ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6447/files - new: https://git.openjdk.java.net/jdk/pull/6447/files/1dae02d4..ad675ed9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6447&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6447&range=01-02 Stats: 123 lines in 3 files changed: 46 ins; 55 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/6447.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6447/head:pull/6447 PR: https://git.openjdk.java.net/jdk/pull/6447 From duke at openjdk.java.net Wed Nov 24 14:18:12 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Wed, 24 Nov 2021 14:18:12 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: <5iLsLwcfgpd5v0Ns-6jzNDhrNJorbfsV0wvBmB5Gml4=.037dd4a6-dd0f-4385-adbe-05532d913c3d@github.com> References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> <5iLsLwcfgpd5v0Ns-6jzNDhrNJorbfsV0wvBmB5Gml4=.037dd4a6-dd0f-4385-adbe-05532d913c3d@github.com> Message-ID: On Wed, 24 Nov 2021 03:52:13 GMT, Jie Fu wrote: >> Mai ??ng Qu?n Anh has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into vectorMaskReduction >> - reduce some dependencies with spare register >> - improve mask reduction logic on AVX > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4065: > >> 4063: void C2_MacroAssembler::vector_mask_operation(int opc, Register dst, KRegister mask, >> 4064: int masklen, int masksize, int vec_enc) { >> 4065: assert(VM_Version::supports_popcnt() && > > New instructions like `lzcntq` and `tzcntq` are used for the optimized code gen without detecting the availability. > I'm a bit worried about that. > > So do all AVX512 platforms support them? > Thanks. Yes, you are right. I can't find concrete evidence that AVX512 implies BMI1. In addition, `VectorMaskGen` does a check for AVX3 simultaneously with a check for BMI2. So it seems safer to do the same as we did with AVX1 - 2. As a result, I refactored the reduction step into a separate function. Thank you very much. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From phedlin at openjdk.java.net Wed Nov 24 14:19:04 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Wed, 24 Nov 2021 14:19:04 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend [v2] In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: On Wed, 24 Nov 2021 11:52:45 GMT, Patric Hedlin wrote: >> C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). >> >> Contributed by Nick Gasson. >> >> Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). >> >> Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. > > Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: > > Clean-up address calculation via use of legitimize_address(). On second thought, we should only use legitimate_address() when it's necessary, to avoid the additional overhead when actual base+offset addressing modes are available. ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From duke at openjdk.java.net Wed Nov 24 16:17:34 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Wed, 24 Nov 2021 16:17:34 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) [v3] In-Reply-To: References: Message-ID: > Refactor PredecessorValidator, more or less applying the following: > > declare variables where used > redeclare instead of reuse variables > move assert to a more logical place > remove unused length variable > inline variables where senseful > split loops > extract methods > > this is done in preparation for work on optimizing IR::verify. IR::verify calls PredecessorValidator. If the work of PredecessorValidator is made clearer, it will be easier to reason about where IR::verify doesn't need to be called (or where a subset of it would suffice). Ludvig Janiuk has updated the pull request incrementally with two additional commits since the last revision: - last one now - more asterisks ;) ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6394/files - new: https://git.openjdk.java.net/jdk/pull/6394/files/8ed15361..cabdf9a9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6394&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6394&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6394.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6394/head:pull/6394 PR: https://git.openjdk.java.net/jdk/pull/6394 From duke at openjdk.java.net Wed Nov 24 16:17:37 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Wed, 24 Nov 2021 16:17:37 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) [v2] In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 12:50:38 GMT, Ludvig Janiuk wrote: >> Refactor PredecessorValidator, more or less applying the following: >> >> declare variables where used >> redeclare instead of reuse variables >> move assert to a more logical place >> remove unused length variable >> inline variables where senseful >> split loops >> extract methods >> >> this is done in preparation for work on optimizing IR::verify. IR::verify calls PredecessorValidator. If the work of PredecessorValidator is made clearer, it will be easier to reason about where IR::verify doesn't need to be called (or where a subset of it would suffice). > > Ludvig Janiuk has updated the pull request incrementally with one additional commit since the last revision: > > formatting: type asterisks Okay, all the asterisks should be in their appropriate places now :) ------------- PR: https://git.openjdk.java.net/jdk/pull/6394 From chagedorn at openjdk.java.net Wed Nov 24 16:46:05 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 24 Nov 2021 16:46:05 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) [v3] In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 16:17:34 GMT, Ludvig Janiuk wrote: >> Refactor PredecessorValidator, more or less applying the following: >> >> declare variables where used >> redeclare instead of reuse variables >> move assert to a more logical place >> remove unused length variable >> inline variables where senseful >> split loops >> extract methods >> >> this is done in preparation for work on optimizing IR::verify. IR::verify calls PredecessorValidator. If the work of PredecessorValidator is made clearer, it will be easier to reason about where IR::verify doesn't need to be called (or where a subset of it would suffice). > > Ludvig Janiuk has updated the pull request incrementally with two additional commits since the last revision: > > - last one now > - more asterisks ;) Thanks for doing the updates! src/hotspot/share/c1/c1_IR.cpp line 1308: > 1306: } > 1307: for (int i = 0; i < block->number_of_exception_handlers(); i++) { > 1308: assert(block->exception_handler_at(i)->is_set(BlockBegin::exception_entry_flag), "must be xhandler"); Could these two assertions also directly be moved before the `collect_predecessor()` calls below? ------------- PR: https://git.openjdk.java.net/jdk/pull/6394 From jbhateja at openjdk.java.net Wed Nov 24 18:57:22 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 24 Nov 2021 18:57:22 GMT Subject: RFR: 8277793: Support vector F2I and D2L cast operations for X86 Message-ID: - JDK-8275317 extended auto-vectorizer to infer Vector Cast operations if source and destination primitive type have same size. - This patch adds the backend support for vector CastF2I and CaseD2L on X86 AVX512 and legacy targets. Following are the performance measurements of an existing JMH benchmark (test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) System Configuration : Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server) BENCHMARK | SIZE | BASELINE (AVX3) ns/op | WithOpt (AVX3) ns/op | Gain AVX3(baseline/opt) | BASELINE (AVX2) ns/op | WithOpt (AVX2) ns/op | Gain AVX2 (baseline/opt) -- | -- | -- | -- | -- | -- | -- | -- TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 512.00 | 256.26 | 77.50 | 3.31 | 275.49 | 275.65 | 1.00 TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 1024.00 | 501.87 | 150.35 | 3.34 | 540.47 | 541.22 | 1.00 TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 2048.00 | 993.05 | 293.23 | 3.39 | 1070.56 | 1070.14 | 1.00 TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 512.00 | 227.83 | 39.36 | 5.79 | 248.25 | 45.01 | 5.52 TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 1024.00 | 449.70 | 77.88 | 5.77 | 487.33 | 86.15 | 5.66 TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 2048.00 | 884.95 | 149.58 | 5.92 | 956.58 | 152.45 | 6.27 Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8277793: Support vector F2I and D2L cast operations for X86 Changes: https://git.openjdk.java.net/jdk/pull/6544/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6544&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277793 Stats: 172 lines in 5 files changed: 166 ins; 1 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/6544.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6544/head:pull/6544 PR: https://git.openjdk.java.net/jdk/pull/6544 From xliu at openjdk.java.net Wed Nov 24 23:42:27 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 24 Nov 2021 23:42:27 GMT Subject: RFR: 8274983: C1 optimizes the invocation of private interface methods [v3] In-Reply-To: References: Message-ID: > The root cause of the C1 regression is that some regex generate multiple classes which all implement > an interface. In SlowStartupTest.java, the following **invokeinterface** happens frequently with different receivers. the target is a private interface method. > > > 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z > > > This patch allows c1 to generate the optimized virtual call for invokeinterface > whose targets are the private interface methods. > > Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private > interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, > LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because > it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, > it is possible that they trash the IC stub using their own concrete klass in runtime. > > Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM > 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. > Therefore, this patch can prevent the callsite from trashing. > > Before this patch, SlowStartupTest had 38770 times _resolve_invoke_virtual_cnt and 38695 _handle_wrong_method_cnt per 10k iterations. To dump `C1Statistics`, we use fastdebug build for comparison. > > > $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 > Executed 10000 iterations in 736ms > C1 Runtime statistics: > _resolve_invoke_virtual_cnt: 38770 > _resolve_invoke_opt_virtual_cnt: 186 > _resolve_invoke_static_cnt: 44 > _handle_wrong_method_cnt: 38695 > _ic_miss_cnt: 35 > > > With this patch, only 1 _handle_wrong_method_cnt is triggered but we have 3 more `_resolve_invoke_opt_virtual_cnt` events instead. The total runtime reduces from 736ms to 9ms. > > > $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 > Executed 10000 iterations in 9ms > C1 Runtime statistics: > _resolve_invoke_virtual_cnt: 77 > _resolve_invoke_opt_virtual_cnt: 189 > _resolve_invoke_static_cnt: 45 > _handle_wrong_method_cnt: 1 > _ic_miss_cnt: 39 > > > Codegen wise, before the patch, C1 generates LIR for the invokeinterface whose target is a private interface methoda as follows. > > __bci__use__tid____instr____________________________________ > . 1 0 v2 a1.invokeinterface() > InvokePrivateInterfaceMethod$I.bar()V > . 6 0 v3 return > > > With this patch, C1 generates LIR as follows. it first check a1 is a subtype of `InvokePrivateInterfaceMethod$I`. if so, an optimized virtual call is generated. The callsite will be fixed up once and only one time in runtime. > > __bci__use__tid____instr____________________________________ > . 1 1 a2 checkcast(a1) InvokePrivateInterfaceMethod$I > stack [0:a1] > . 1 0 v3 a2.invokeinterface() > InvokePrivateInterfaceMethod$I.bar()V > . 6 0 v4 return Xin Liu has updated the pull request incrementally with one additional commit since the last revision: Call set_invokespecial_receiver_check() so invokespecial throws IllegalAccessError. We need to checkcast for invokespecial even target->can_be_statically_bound() is false. eg. https://github.com/openjdk/jdk/blob/master/test/jdk/java/lang/invoke/SpecialInterfaceCallI4.jasm#L33 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6445/files - new: https://git.openjdk.java.net/jdk/pull/6445/files/acf7b9f8..6a10e772 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6445&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6445&range=01-02 Stats: 30 lines in 1 file changed: 16 ins; 13 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6445.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6445/head:pull/6445 PR: https://git.openjdk.java.net/jdk/pull/6445 From xliu at openjdk.java.net Thu Nov 25 00:03:11 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 25 Nov 2021 00:03:11 GMT Subject: RFR: 8274983: C1 optimizes the invocation of private interface methods [v2] In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 10:19:18 GMT, Dean Long wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - 8274983: C1 optimizes the invocation of private interface methods >> - Merge branch 'master' into JDK-8274983 >> - 8274983: Pattern.matcher performance regression after JDK-823835 >> >> This patch allows c1 to generate the optimized virtual call for invokeinterface >> whose targets are the private interface methods. >> >> Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private >> interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, >> LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because >> it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, >> it is possible that they trash the IC stub using their own concrete klass in runtime. >> >> Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM >> 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. >> Therefore, this patch can prevent the callsite from trashing. > > src/hotspot/share/c1/c1_GraphBuilder.cpp line 1902: > >> 1900: Value receiver = state()->stack_at(index); >> 1901: CheckCast* c = new CheckCast(receiver_constraint, receiver, copy_state_before()); >> 1902: c->set_incompatible_class_change_check(); > > set_incompatible_class_change_check() seems OK for invokeinterface, but for invokespecial, I believe the interpreter throws IllegalAccessError. Thanks to point it out. I take a look at invokespecial and invokeinterface part of JVM spec. There are multiple cases. I think I better use uncommon trap and let interpreter to cover them all. It's not a hot path anyway. By reverting this code, `java/lang/invoke/SpecialInterfaceCall.java` is fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6445 From dlong at openjdk.java.net Thu Nov 25 03:55:08 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 25 Nov 2021 03:55:08 GMT Subject: RFR: 8274983: C1 optimizes the invocation of private interface methods [v3] In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 23:42:27 GMT, Xin Liu wrote: >> The root cause of the C1 regression is that some regex generate multiple classes which all implement >> an interface. In SlowStartupTest.java, the following **invokeinterface** happens frequently with different receivers. the target is a private interface method. >> >> >> 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z >> >> >> This patch allows c1 to generate the optimized virtual call for invokeinterface >> whose targets are the private interface methods. >> >> Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private >> interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, >> LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because >> it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, >> it is possible that they trash the IC stub using their own concrete klass in runtime. >> >> Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM >> 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. >> Therefore, this patch can prevent the callsite from trashing. >> >> Before this patch, SlowStartupTest had 38770 times _resolve_invoke_virtual_cnt and 38695 _handle_wrong_method_cnt per 10k iterations. To dump `C1Statistics`, we use fastdebug build for comparison. >> >> >> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 >> Executed 10000 iterations in 736ms >> C1 Runtime statistics: >> _resolve_invoke_virtual_cnt: 38770 >> _resolve_invoke_opt_virtual_cnt: 186 >> _resolve_invoke_static_cnt: 44 >> _handle_wrong_method_cnt: 38695 >> _ic_miss_cnt: 35 >> >> >> With this patch, only 1 _handle_wrong_method_cnt is triggered but we have 3 more `_resolve_invoke_opt_virtual_cnt` events instead. The total runtime reduces from 736ms to 9ms. >> >> >> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 >> Executed 10000 iterations in 9ms >> C1 Runtime statistics: >> _resolve_invoke_virtual_cnt: 77 >> _resolve_invoke_opt_virtual_cnt: 189 >> _resolve_invoke_static_cnt: 45 >> _handle_wrong_method_cnt: 1 >> _ic_miss_cnt: 39 >> >> >> Codegen wise, before the patch, C1 generates LIR for the invokeinterface whose target is a private interface methoda as follows. >> >> __bci__use__tid____instr____________________________________ >> . 1 0 v2 a1.invokeinterface() >> InvokePrivateInterfaceMethod$I.bar()V >> . 6 0 v3 return >> >> >> With this patch, C1 generates LIR as follows. it first check a1 is a subtype of `InvokePrivateInterfaceMethod$I`. if so, an optimized virtual call is generated. The callsite will be fixed up once and only one time in runtime. >> >> __bci__use__tid____instr____________________________________ >> . 1 1 a2 checkcast(a1) InvokePrivateInterfaceMethod$I >> stack [0:a1] >> . 1 0 v3 a2.invokeinterface() >> InvokePrivateInterfaceMethod$I.bar()V >> . 6 0 v4 return > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Call set_invokespecial_receiver_check() so invokespecial throws IllegalAccessError. > > We need to checkcast for invokespecial even target->can_be_statically_bound() is false. > eg. https://github.com/openjdk/jdk/blob/master/test/jdk/java/lang/invoke/SpecialInterfaceCallI4.jasm#L33 As far as I can tell, this now matches what C2 does. ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6445 From jiefu at openjdk.java.net Thu Nov 25 06:37:08 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 25 Nov 2021 06:37:08 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v3] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Wed, 24 Nov 2021 14:05:34 GMT, Mai ??ng Qu?n Anh wrote: >> Hi, >> >> This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. >> >> The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request incrementally with one additional commit since the last revision: > > add check bmi Thanks for your update. All my tests passed with the latest version. So LGTM. ------------- Marked as reviewed by jiefu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6447 From duke at openjdk.java.net Thu Nov 25 07:36:06 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Thu, 25 Nov 2021 07:36:06 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Wed, 24 Nov 2021 03:58:00 GMT, Jie Fu wrote: >>> @DamonFool Could you please review this patch? >> >> I will do some testing and feedback. >> Thanks. > >> > @DamonFool Could you please review this patch? >> >> I will do some testing and feedback. Thanks. > > All the tests passed. > But I'm not sure whether all avx512 machines support bmi1 and lzcnt features. @DamonFool Thanks a lot for your review. Do I need a re-review from @sviswa7 now? ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From duke at openjdk.java.net Thu Nov 25 08:09:04 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Thu, 25 Nov 2021 08:09:04 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) [v3] In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 16:41:35 GMT, Christian Hagedorn wrote: >> Ludvig Janiuk has updated the pull request incrementally with two additional commits since the last revision: >> >> - last one now >> - more asterisks ;) > > src/hotspot/share/c1/c1_IR.cpp line 1308: > >> 1306: } >> 1307: for (int i = 0; i < block->number_of_exception_handlers(); i++) { >> 1308: assert(block->exception_handler_at(i)->is_set(BlockBegin::exception_entry_flag), "must be xhandler"); > > Could these two assertions also directly be moved before the `collect_predecessor()` calls below? I've split these loops up on purpose because I want to separate concerns. I appreciate the urge to optimize for fewer iterations, but I think the added readability will enable other, bigger optimizations. ------------- PR: https://git.openjdk.java.net/jdk/pull/6394 From jiefu at openjdk.java.net Thu Nov 25 08:19:04 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 25 Nov 2021 08:19:04 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Wed, 24 Nov 2021 03:58:00 GMT, Jie Fu wrote: >>> @DamonFool Could you please review this patch? >> >> I will do some testing and feedback. >> Thanks. > >> > @DamonFool Could you please review this patch? >> >> I will do some testing and feedback. Thanks. > > All the tests passed. > But I'm not sure whether all avx512 machines support bmi1 and lzcnt features. > @DamonFool Thanks a lot for your review. Do I need a re-review from @sviswa7 now? Yes, I think so. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From thartmann at openjdk.java.net Thu Nov 25 08:32:03 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 25 Nov 2021 08:32:03 GMT Subject: RFR: 8277602: Deopt code does not extend the stack enough if the caller is an optimize entry blob In-Reply-To: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> References: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> Message-ID: On Tue, 23 Nov 2021 15:01:24 GMT, Jorn Vernee wrote: > Deoptimization code does not recreate c2i adapter 'frames'. For compiled callers this means that the stack needs to be adjusted manually to make room for the parameters when the callee is converted to an interpreter frame (essentially emulating what a c2i adapter would do). > > To check if the caller does a compiled call, the current code uses `frame::is_compiled_frame()`, which is true if the codeblob of the caller frame is an instance of `CompiledMethod`. > > However, optimized entry blobs also do compiled calls, are not detected by this test, and therefore don't get their stack adjusted correctly. > > To address this, I've added a new `frame::is_compiled_caller` function to determine if the caller is doing a compiled call, and I use that in the deopt code instead of `is_compiled_frame`. > > This patch also removes an old workaround that tried to fix the issue by allocating some spill space in the optimized entry blob frame, but this only accounts for the first argument. If there are more arguments we still have a problem. The suggested patch fixes this the right way I think. > > Thanks, > Jorn > > Testing: run-test-jdk_foreign on Windows x64 and Linux x64 (afaik these are the only tests that use optimized entry blobs) Looks good but a targeted regression test (or a noreg-* label) is required. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6522 From chagedorn at openjdk.java.net Thu Nov 25 08:34:08 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 25 Nov 2021 08:34:08 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) [v3] In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 08:06:28 GMT, Ludvig Janiuk wrote: >> src/hotspot/share/c1/c1_IR.cpp line 1308: >> >>> 1306: } >>> 1307: for (int i = 0; i < block->number_of_exception_handlers(); i++) { >>> 1308: assert(block->exception_handler_at(i)->is_set(BlockBegin::exception_entry_flag), "must be xhandler"); >> >> Could these two assertions also directly be moved before the `collect_predecessor()` calls below? > > I've split these loops up on purpose because I want to separate concerns. I appreciate the urge to optimize for fewer iterations, but I think the added readability will enable other, bigger optimizations. I don't have a strict opinion here but I thought the assertions belong to the `block->end()->sux_at(i)/block->exception_handler_at(i)` calls below as an additional verification of the kind of blocks. But I'm fine with both. ------------- PR: https://git.openjdk.java.net/jdk/pull/6394 From chagedorn at openjdk.java.net Thu Nov 25 09:03:07 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 25 Nov 2021 09:03:07 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) [v3] In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 16:17:34 GMT, Ludvig Janiuk wrote: >> Refactor PredecessorValidator, more or less applying the following: >> >> declare variables where used >> redeclare instead of reuse variables >> move assert to a more logical place >> remove unused length variable >> inline variables where senseful >> split loops >> extract methods >> >> this is done in preparation for work on optimizing IR::verify. IR::verify calls PredecessorValidator. If the work of PredecessorValidator is made clearer, it will be easier to reason about where IR::verify doesn't need to be called (or where a subset of it would suffice). > > Ludvig Janiuk has updated the pull request incrementally with two additional commits since the last revision: > > - last one now > - more asterisks ;) Marked as reviewed by chagedorn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6394 From duke at openjdk.java.net Thu Nov 25 10:50:10 2021 From: duke at openjdk.java.net (duke) Date: Thu, 25 Nov 2021 10:50:10 GMT Subject: Withdrawn: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 15:25:46 GMT, Volker Simonis wrote: > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From aph at openjdk.java.net Thu Nov 25 12:33:09 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 25 Nov 2021 12:33:09 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend [v2] In-Reply-To: <7IURH4of06nAIMsMnZjL8057mgJ_fg5T_FHKQ-bxo7o=.bd8da053-2044-4a1f-8884-8722f98ae91e@github.com> References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> <7IURH4of06nAIMsMnZjL8057mgJ_fg5T_FHKQ-bxo7o=.bd8da053-2044-4a1f-8884-8722f98ae91e@github.com> Message-ID: On Wed, 24 Nov 2021 13:36:41 GMT, Patric Hedlin wrote: > The implementation of 'lea' does not seem to handle scale (for base_plus_offset). Am I missing something? No, you aren't! Good point. Please add an assertion to `Address::lea (base_plus_offset)` that the shift is zero. Let this patch stand, then I withdraw my objection. ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From aph at openjdk.java.net Thu Nov 25 12:36:07 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 25 Nov 2021 12:36:07 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend [v2] In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: On Wed, 24 Nov 2021 14:15:34 GMT, Patric Hedlin wrote: > On second thought, we should only use legitimate_address() when it's necessary, to avoid the additional overhead when actual base+offset addressing modes are available. No, we should not do that. `legitimize_address()` is the oracle that decides when `legitimize_address()` is necessary. Please use it in the way that it is intended to be used. ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From jiefu at openjdk.java.net Thu Nov 25 14:49:28 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 25 Nov 2021 14:49:28 GMT Subject: RFR: 8277843: [Vector API] scalar2vector shouldn't be used for mask operations if Op_MaskAll is unavailable Message-ID: Hi all, This bug was first observed on x86_32/AVX512. It caused 62 vector api test failures. ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR >> jtreg:test/jdk/jdk/incubator/vector 74 12 62 0 << ============================== You can easily reproduce this bug on an AVX512 machine with x86_32. Or you can also reproduce it on an AVX512 machine with x86_64 if you disable `Op_MaskAll` like this. diff --git a/src/hotspot/cpu/x86/x86.ad b/src/hotspot/cpu/x86/x86.ad index 3f6d5a44b0d..d5a751b310d 100644 --- a/src/hotspot/cpu/x86/x86.ad +++ b/src/hotspot/cpu/x86/x86.ad @@ -1819,6 +1819,7 @@ const bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType } break; case Op_MaskAll: + return false; if (!is_LP64 || !VM_Version::supports_evex()) { return false; } The failure reason is that `VectorNode::scalar2vector` generate incorrect IR for mask operations if `Op_MaskAll` is unavailable. So it shouldn't be used for mask operations if `Op_MaskAll` is unavailable. Testing (with two more bug fixes https://github.com/openjdk/jdk/pull/6535 and https://github.com/openjdk/jdk/pull/6533): - vector api tests on {x86_64, x86_32}/{AVX512, AVX256}, all passed - vector api tests on aarch64, all passed Thanks. Best regards, Jie ------------- Commit messages: - 8277843: [Vector API] scalar2vector shouldn't be used for mask operations if Op_MaskAll is unavailable Changes: https://git.openjdk.java.net/jdk/pull/6562/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6562&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277843 Stats: 12 lines in 2 files changed: 11 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6562.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6562/head:pull/6562 PR: https://git.openjdk.java.net/jdk/pull/6562 From chagedorn at openjdk.java.net Thu Nov 25 14:54:22 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 25 Nov 2021 14:54:22 GMT Subject: RFR: 8277842: IGV: Add jvms property to know where a node came from Message-ID: When dumping a node with `node->dump()`, it also prints the JVM state which tells us to which bci and inlinee method the node belongs to: 38 StoreI === 5 7 37 35 [[ 15 ]] @java/lang/Class:exact+116 *, name=y, idx=5; Memory: @java/lang/Class:exact+116 *, name=y, idx=5; !jvms: Test::inlinee @ bci:1 (line 16) Test::test @ bci:4 (line 12) IGV only shows the line and bci information with which it is sometimes hard to tell where exactly the node came from, especially with deep inlining. This patch adds the entire JVM state as a `jvms` property field to IGV: ![Screenshot from 2021-11-25 15-21-53](https://user-images.githubusercontent.com/17833009/143460385-baf5ee3a-31b0-4693-bfa4-0de91c3c4822.png) This helps to better analyze a graph. Thanks, Christian ------------- Commit messages: - 8277842: IGV: Add jvms property to know where a node came from Changes: https://git.openjdk.java.net/jdk/pull/6563/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6563&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277842 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6563.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6563/head:pull/6563 PR: https://git.openjdk.java.net/jdk/pull/6563 From phedlin at openjdk.java.net Thu Nov 25 15:04:08 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Thu, 25 Nov 2021 15:04:08 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend [v2] In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: On Wed, 24 Nov 2021 11:52:45 GMT, Patric Hedlin wrote: >> C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). >> >> Contributed by Nick Gasson. >> >> Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). >> >> Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. > > Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: > > Clean-up address calculation via use of legitimize_address(). I do not think adding an assert in the 'Address::lea' implementation is all that straight forward. I've noticed that there are plenty of initialisation missing around 'Address' and in particular the extended part. I think cleaning up that part is a task in its own right. But, I will have a look. ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From mdoerr at openjdk.java.net Thu Nov 25 15:42:28 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 25 Nov 2021 15:42:28 GMT Subject: RFR: 8277846: Implement fast-path for ASCII-compatible CharsetEncoders on ppc64 Message-ID: PPC64 port of 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 I moved the `encode_iso_array` implementation into `C2_MacroAssembler` and reused it for the new ASCII node. The algorithm is unchanged. We only need to change the mask because (non-extended) ASCII uses 7 bit (also see x86 implementation). ------------- Commit messages: - 8277846: Implement fast-path for ASCII-compatible CharsetEncoders on ppc64 Changes: https://git.openjdk.java.net/jdk/pull/6565/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6565&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277846 Stats: 77 lines in 3 files changed: 46 ins; 14 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/6565.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6565/head:pull/6565 PR: https://git.openjdk.java.net/jdk/pull/6565 From mdoerr at openjdk.java.net Thu Nov 25 15:46:29 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 25 Nov 2021 15:46:29 GMT Subject: RFR: 8277846: Implement fast-path for ASCII-compatible CharsetEncoders on ppc64 [v2] In-Reply-To: References: Message-ID: > PPC64 port of 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 > I moved the `encode_iso_array` implementation into `C2_MacroAssembler` and reused it for the new ASCII node. The algorithm is unchanged. We only need to change the mask because (non-extended) ASCII uses 7 bit (also see x86 implementation). Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Enable new ASCII node. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6565/files - new: https://git.openjdk.java.net/jdk/pull/6565/files/96b7e6c8..b41e9d97 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6565&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6565&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6565.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6565/head:pull/6565 PR: https://git.openjdk.java.net/jdk/pull/6565 From duke at openjdk.java.net Thu Nov 25 16:21:02 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Thu, 25 Nov 2021 16:21:02 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) [v3] In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 08:30:19 GMT, Christian Hagedorn wrote: >> I've split these loops up on purpose because I want to separate concerns. I appreciate the urge to optimize for fewer iterations, but I think the added readability will enable other, bigger optimizations. > > I don't have a strict opinion here but I thought the assertions belong to the `block->end()->sux_at(i)/block->exception_handler_at(i)` calls below as an additional verification of the kind of blocks. But I'm fine with both. My reading was that `PredecessorValidator` validates several things. Validating the flags in one concern, the other is what happens in `verify_block_preds_against_collected_preds`. And `collect_predecessors` is just necessary to make `verify_block_preds_against_collected_preds` possible. I can sort of imagine what you mean, but those assertions aren't necessary for a call to `block->end()->sux_at(i)`. So I'll keep it as is if it's fine by you. ------------- PR: https://git.openjdk.java.net/jdk/pull/6394 From aph at openjdk.java.net Thu Nov 25 16:40:03 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 25 Nov 2021 16:40:03 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend [v2] In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: <8IARS8e2P8EzZBsSJDbphrwoFZn6egFUW-86KgZTX4k=.9888c2d4-0bd5-4306-9c75-b82e30eb6859@github.com> On Thu, 25 Nov 2021 15:01:22 GMT, Patric Hedlin wrote: > I do not think adding an assert in the 'Address::lea' implementation is all that straight forward. I've noticed that there are plenty of initialisation missing around 'Address' and in particular the extended part. I think cleaning up that part is a task in its own right. But, I will have a look. OK. It can wait. ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From roland at openjdk.java.net Thu Nov 25 17:03:38 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 25 Nov 2021 17:03:38 GMT Subject: RFR: 8275330: C2: assert(n->is_Root() || n->is_Region() || n->is_Phi() || n->is_MachMerge() || def_block->dominates(block)) failed: uses must be dominated by definitions [v2] In-Reply-To: References: Message-ID: > This is similar to previous bugs where: > > - a cast/conv node captures a narrow type in a loop body because of a > range check, > > - the range check is optimized out of the loop, pre/main/post loop are > created > > - overunrolling causes the main loop to become unreachable (the range > check, if still in the main loop, would fail), the cast transforms to > top but c2 can't optimize the loop out > > This was fixed by adding predicates above the main loop. With this > particular bug, the cast node is in the post loop. The fix I propose > is to also add predicates above the post loop. There are a few > locations in the code that cause a post loop to be added: either the > initial post loop or some other post loops for vectorization > support. I think the new predicates are needed in a all cases. To be > able to add predicates at these different points in the optimization > process, the new predicates are copied from the main loop predicates. > > I also delayed folding of Opaque4 nodes to macro expansion rather than > post loop opts igvn. The reason for that is that I believe there's a > risk that an Opaque4 is removed (that is replaced by its input 2) > before its input 1 has a chance to constant fold. That wouldn't happen > with a debug build because we leave the tests in (that is replace the > Opaque4 node by its input 1) so that corner case is not properly > tested currently. The reason for leaving the tests in was to sanity > check that the tests are indeed correct. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8275330 - reviews - fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6429/files - new: https://git.openjdk.java.net/jdk/pull/6429/files/50c8c797..b194700e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6429&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6429&range=00-01 Stats: 57869 lines in 931 files changed: 38122 ins; 10933 del; 8814 mod Patch: https://git.openjdk.java.net/jdk/pull/6429.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6429/head:pull/6429 PR: https://git.openjdk.java.net/jdk/pull/6429 From neliasso at openjdk.java.net Thu Nov 25 18:09:16 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 25 Nov 2021 18:09:16 GMT Subject: RFR: JDK-8264838: IGV: enhance graph export functionality Message-ID: Hi, This patch adds SVG and searchable PDF export functionality to IGV. It's originally contributed by rcastanedalo at openjdk.java.net. I have updated the patch with new library versions, rebased and tested it. Please review, Nils Eliasson ------------- Commit messages: - Update versions - Merge master - Remove unused import - Check whether writer is null before closing - Update copyright years - Add option to export graphs as searchable PDF files - Enable SVG exports by default Changes: https://git.openjdk.java.net/jdk/pull/6564/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6564&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8264838 Stats: 362 lines in 11 files changed: 83 ins; 262 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/6564.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6564/head:pull/6564 PR: https://git.openjdk.java.net/jdk/pull/6564 From jvernee at openjdk.java.net Thu Nov 25 18:48:45 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Thu, 25 Nov 2021 18:48:45 GMT Subject: RFR: 8277602: Deopt code does not extend the stack enough if the caller is an optimize entry blob [v2] In-Reply-To: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> References: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> Message-ID: > Deoptimization code does not recreate c2i adapter 'frames'. For compiled callers this means that the stack needs to be adjusted manually to make room for the parameters when the callee is converted to an interpreter frame (essentially emulating what a c2i adapter would do). > > To check if the caller does a compiled call, the current code uses `frame::is_compiled_frame()`, which is true if the codeblob of the caller frame is an instance of `CompiledMethod`. > > However, optimized entry blobs also do compiled calls, are not detected by this test, and therefore don't get their stack adjusted correctly. > > To address this, I've added a new `frame::is_compiled_caller` function to determine if the caller is doing a compiled call, and I use that in the deopt code instead of `is_compiled_frame`. > > This patch also removes an old workaround that tried to fix the issue by allocating some spill space in the optimized entry blob frame, but this only accounts for the first argument. If there are more arguments we still have a problem. The suggested patch fixes this the right way I think. > > Thanks, > Jorn > > Testing: run-test-jdk_foreign on Windows x64 and Linux x64 (afaik these are the only tests that use optimized entry blobs) Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Fix after merge - Merge branch 'master' into Deopt_Stack_Fix - Add test + asserts - Properly handle optimized entry frame callers during deopt ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6522/files - new: https://git.openjdk.java.net/jdk/pull/6522/files/443b93e3..c998ef2a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6522&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6522&range=00-01 Stats: 19311 lines in 322 files changed: 10899 ins; 5442 del; 2970 mod Patch: https://git.openjdk.java.net/jdk/pull/6522.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6522/head:pull/6522 PR: https://git.openjdk.java.net/jdk/pull/6522 From jvernee at openjdk.java.net Thu Nov 25 19:35:03 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Thu, 25 Nov 2021 19:35:03 GMT Subject: RFR: 8277602: Deopt code does not extend the stack enough if the caller is an optimize entry blob [v2] In-Reply-To: References: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> Message-ID: On Thu, 25 Nov 2021 18:48:45 GMT, Jorn Vernee wrote: >> Deoptimization code does not recreate c2i adapter 'frames'. For compiled callers this means that the stack needs to be adjusted manually to make room for the parameters when the callee is converted to an interpreter frame (essentially emulating what a c2i adapter would do). >> >> To check if the caller does a compiled call, the current code uses `frame::is_compiled_frame()`, which is true if the codeblob of the caller frame is an instance of `CompiledMethod`. >> >> However, optimized entry blobs also do compiled calls, are not detected by this test, and therefore don't get their stack adjusted correctly. >> >> To address this, I've added a new `frame::is_compiled_caller` function to determine if the caller is doing a compiled call, and I use that in the deopt code instead of `is_compiled_frame`. >> >> This patch also removes an old workaround that tried to fix the issue by allocating some spill space in the optimized entry blob frame, but this only accounts for the first argument. If there are more arguments we still have a problem. The suggested patch fixes this the right way I think. >> >> Thanks, >> Jorn >> >> Testing: run-test-jdk_foreign on Windows x64 and Linux x64 (afaik these are the only tests that use optimized entry blobs) > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Fix after merge > - Merge branch 'master' into Deopt_Stack_Fix > - Add test + asserts > - Properly handle optimized entry frame callers during deopt I've added a test and a couple of asserts that catch the case I'm trying to fix (mostly so that the test fails in a more obvious way). If I revert the fix in deoptimization.cpp the latter assert fires (that's the case I found during debugging as well), and when I re-add the fix tests pass again. I ran this through tier1-3 as well. The asserts I've added only check for 'overflow' in the case of compiled callers. I played around with adding a similar check for interpreted callers as well, but I wasn't able to provoke an assertion failure with that, and I'm not 100% what the right check should be. I suspect interpreted callers are rare when we deopt due to an uncommon trap. For these reasons, I've left the asserts to check for overflow in the case of compiled callers only, for now. ------------- PR: https://git.openjdk.java.net/jdk/pull/6522 From jbhateja at openjdk.java.net Fri Nov 26 06:54:07 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 26 Nov 2021 06:54:07 GMT Subject: RFR: 8277843: [Vector API] scalar2vector shouldn't be used for mask operations if Op_MaskAll is unavailable In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 14:41:27 GMT, Jie Fu wrote: > Hi all, > > This bug was first observed on x86_32/AVX512. > It caused 62 vector api test failures. > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR >>> jtreg:test/jdk/jdk/incubator/vector 74 12 62 0 << > ============================== > > > You can easily reproduce this bug on an AVX512 machine with x86_32. > Or you can also reproduce it on an AVX512 machine with x86_64 if you disable `Op_MaskAll` like this. > > diff --git a/src/hotspot/cpu/x86/x86.ad b/src/hotspot/cpu/x86/x86.ad > index 3f6d5a44b0d..d5a751b310d 100644 > --- a/src/hotspot/cpu/x86/x86.ad > +++ b/src/hotspot/cpu/x86/x86.ad > @@ -1819,6 +1819,7 @@ const bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType > } > break; > case Op_MaskAll: > + return false; > if (!is_LP64 || !VM_Version::supports_evex()) { > return false; > } > > > The failure reason is that `VectorNode::scalar2vector` generate incorrect IR for mask operations if `Op_MaskAll` is unavailable. > So it shouldn't be used for mask operations if `Op_MaskAll` is unavailable. > > Testing (with two more bug fixes https://github.com/openjdk/jdk/pull/6535 and https://github.com/openjdk/jdk/pull/6533): > - vector api tests on {x86_64, x86_32}/{AVX512, AVX256}, all passed > - vector api tests on aarch64, all passed > > Thanks. > Best regards, > Jie Hi @DamonFool , MaskAll is only supported for targets having predicate registers, in all other cases a Replicate node should be generated. Ideal type of Replicate node should be TypeVect, reported problem seems to be occurring because ideal type of Replicate Node is TypeVectMask. I think following fix should be sufficient diff --git a/src/hotspot/share/opto/vectornode.cpp b/src/hotspot/share/opto/vectornode.cpp index 6a5b0b9b014..f6c483cf62c 100644 --- a/src/hotspot/share/opto/vectornode.cpp +++ b/src/hotspot/share/opto/vectornode.cpp @@ -588,13 +588,13 @@ VectorNode* VectorNode::make(int opc, Node* n1, Node* n2, Node* n3, uint vlen, B // Scalar promotion VectorNode* VectorNode::scalar2vector(Node* s, uint vlen, const Type* opd_t, bool is_mask) { BasicType bt = opd_t->array_element_basic_type(); - const TypeVect* vt = opd_t->singleton() ? TypeVect::make(opd_t, vlen, is_mask) - : TypeVect::make(bt, vlen, is_mask); - if (is_mask && Matcher::match_rule_supported_vector(Op_MaskAll, vlen, bt)) { + const TypeVect* vt = TypeVect::make(opd_t, vlen, is_mask); return new MaskAllNode(s, vt); } + const TypeVect* vt = opd_t->singleton() ? TypeVect::make(opd_t, vlen) + : TypeVect::make(bt, vlen); switch (bt) { case T_BOOLEAN: case T_BYTE: Best Regards, Jatin ------------- PR: https://git.openjdk.java.net/jdk/pull/6562 From jbhateja at openjdk.java.net Fri Nov 26 06:57:03 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 26 Nov 2021 06:57:03 GMT Subject: RFR: 8277843: [Vector API] scalar2vector shouldn't be used for mask operations if Op_MaskAll is unavailable In-Reply-To: References: Message-ID: <6yMz0DFCdJ0hHZ3BYvCz3ESG7LLe2nn-NRTXDFuPoGE=.2f858a27-427e-4ac7-b41d-e83f76c72840@github.com> On Thu, 25 Nov 2021 14:41:27 GMT, Jie Fu wrote: > Hi all, > > This bug was first observed on x86_32/AVX512. > It caused 62 vector api test failures. > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR >>> jtreg:test/jdk/jdk/incubator/vector 74 12 62 0 << > ============================== > > > You can easily reproduce this bug on an AVX512 machine with x86_32. > Or you can also reproduce it on an AVX512 machine with x86_64 if you disable `Op_MaskAll` like this. > > diff --git a/src/hotspot/cpu/x86/x86.ad b/src/hotspot/cpu/x86/x86.ad > index 3f6d5a44b0d..d5a751b310d 100644 > --- a/src/hotspot/cpu/x86/x86.ad > +++ b/src/hotspot/cpu/x86/x86.ad > @@ -1819,6 +1819,7 @@ const bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType > } > break; > case Op_MaskAll: > + return false; > if (!is_LP64 || !VM_Version::supports_evex()) { > return false; > } > > > The failure reason is that `VectorNode::scalar2vector` generate incorrect IR for mask operations if `Op_MaskAll` is unavailable. > So it shouldn't be used for mask operations if `Op_MaskAll` is unavailable. > > Testing (with two more bug fixes https://github.com/openjdk/jdk/pull/6535 and https://github.com/openjdk/jdk/pull/6533): > - vector api tests on {x86_64, x86_32}/{AVX512, AVX256}, all passed > - vector api tests on aarch64, all passed > > Thanks. > Best regards, > Jie src/hotspot/share/opto/vectorIntrinsics.cpp line 842: > 840: // Op_MaskAll is required in VectorNode::scalar2vector for mask operations. > 841: // So bail out if Op_MaskAll is unavailable. > 842: if (is_vector_mask(vbox_klass) && !Matcher::match_rule_supported_vector(Op_MaskAll, num_elem, elem_bt)) { For targets which do not support MaskAll operation, Replicate node should be wrapped into a mask box. ------------- PR: https://git.openjdk.java.net/jdk/pull/6562 From jiefu at openjdk.java.net Fri Nov 26 07:44:26 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 26 Nov 2021 07:44:26 GMT Subject: RFR: 8277843: [Vector API] scalar2vector generates incorrect type info for mask operations if Op_MaskAll is unavailable [v2] In-Reply-To: References: Message-ID: > Hi all, > > This bug was first observed on x86_32/AVX512. > It caused 62 vector api test failures. > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR >>> jtreg:test/jdk/jdk/incubator/vector 74 12 62 0 << > ============================== > > > You can easily reproduce this bug on an AVX512 machine with x86_32. > Or you can also reproduce it on an AVX512 machine with x86_64 if you disable `Op_MaskAll` like this. > > diff --git a/src/hotspot/cpu/x86/x86.ad b/src/hotspot/cpu/x86/x86.ad > index 3f6d5a44b0d..d5a751b310d 100644 > --- a/src/hotspot/cpu/x86/x86.ad > +++ b/src/hotspot/cpu/x86/x86.ad > @@ -1819,6 +1819,7 @@ const bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType > } > break; > case Op_MaskAll: > + return false; > if (!is_LP64 || !VM_Version::supports_evex()) { > return false; > } > > > The failure reason is that `VectorNode::scalar2vector` generate incorrect IR for mask operations if `Op_MaskAll` is unavailable. > So it shouldn't be used for mask operations if `Op_MaskAll` is unavailable. > > Testing (with two more bug fixes https://github.com/openjdk/jdk/pull/6535 and https://github.com/openjdk/jdk/pull/6533): > - vector api tests on {x86_64, x86_32}/{AVX512, AVX256}, all passed > - vector api tests on aarch64, all passed > > Thanks. > Best regards, > Jie Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6562/files - new: https://git.openjdk.java.net/jdk/pull/6562/files/35c908b8..3772300b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6562&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6562&range=00-01 Stats: 17 lines in 2 files changed: 2 ins; 13 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6562.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6562/head:pull/6562 PR: https://git.openjdk.java.net/jdk/pull/6562 From jiefu at openjdk.java.net Fri Nov 26 07:44:26 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 26 Nov 2021 07:44:26 GMT Subject: RFR: 8277843: [Vector API] scalar2vector generates incorrect type info for mask operations if Op_MaskAll is unavailable In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 06:50:43 GMT, Jatin Bhateja wrote: > MaskAll is only supported for targets having predicate registers, in all other cases a Replicate node should be generated. Thanks @jatin-bhateja for your classification. Fixed. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6562 From roland at openjdk.java.net Fri Nov 26 07:52:04 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 26 Nov 2021 07:52:04 GMT Subject: RFR: 8275330: C2: assert(n->is_Root() || n->is_Region() || n->is_Phi() || n->is_MachMerge() || def_block->dominates(block)) failed: uses must be dominated by definitions [v2] In-Reply-To: References: Message-ID: <327J-sCVvBDRPOIOFZJ9Sx2te7xGXnqrbcbSsyq-Lps=.4c7af5b2-32da-4486-bcfb-fafef20b0571@github.com> On Tue, 23 Nov 2021 07:34:58 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8275330 >> - reviews >> - fix > > That looks good to me. @TobiHartmann @chhagedorn Thanks for the reviews. All comment should be addressed in the updated change. ------------- PR: https://git.openjdk.java.net/jdk/pull/6429 From neliasso at openjdk.java.net Fri Nov 26 08:22:01 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 26 Nov 2021 08:22:01 GMT Subject: RFR: 8277793: Support vector F2I and D2L cast operations for X86 In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 18:50:17 GMT, Jatin Bhateja wrote: > - JDK-8275317 extended auto-vectorizer to infer Vector Cast operations if source and destination primitive type have same size. > - This patch adds the backend support for vector CastF2I and CaseD2L on X86 AVX512 and legacy targets. > > Following are the performance measurements of an existing JMH benchmark (test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) > > System Configuration : Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server) > > BENCHMARK | SIZE | BASELINE (AVX3) ns/op | WithOpt (AVX3) ns/op | Gain AVX3(baseline/opt) | BASELINE (AVX2) ns/op | WithOpt (AVX2) ns/op | Gain AVX2 (baseline/opt) > -- | -- | -- | -- | -- | -- | -- | -- > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 512.00 | 256.26 | 77.50 | 3.31 | 275.49 | 275.65 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 1024.00 | 501.87 | 150.35 | 3.34 | 540.47 | 541.22 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 2048.00 | 993.05 | 293.23 | 3.39 | 1070.56 | 1070.14 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 512.00 | 227.83 | 39.36 | 5.79 | 248.25 | 45.01 | 5.52 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 1024.00 | 449.70 | 77.88 | 5.77 | 487.33 | 86.15 | 5.66 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 2048.00 | 884.95 | 149.58 | 5.92 | 956.58 | 152.45 | 6.27 > > Kindly review and share your feedback. > > Best Regards, > Jatin Is this change already covered by tests and micros? ------------- PR: https://git.openjdk.java.net/jdk/pull/6544 From chagedorn at openjdk.java.net Fri Nov 26 08:27:08 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 26 Nov 2021 08:27:08 GMT Subject: RFR: 8275330: C2: assert(n->is_Root() || n->is_Region() || n->is_Phi() || n->is_MachMerge() || def_block->dominates(block)) failed: uses must be dominated by definitions [v2] In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 17:03:38 GMT, Roland Westrelin wrote: >> This is similar to previous bugs where: >> >> - a cast/conv node captures a narrow type in a loop body because of a >> range check, >> >> - the range check is optimized out of the loop, pre/main/post loop are >> created >> >> - overunrolling causes the main loop to become unreachable (the range >> check, if still in the main loop, would fail), the cast transforms to >> top but c2 can't optimize the loop out >> >> This was fixed by adding predicates above the main loop. With this >> particular bug, the cast node is in the post loop. The fix I propose >> is to also add predicates above the post loop. There are a few >> locations in the code that cause a post loop to be added: either the >> initial post loop or some other post loops for vectorization >> support. I think the new predicates are needed in a all cases. To be >> able to add predicates at these different points in the optimization >> process, the new predicates are copied from the main loop predicates. >> >> I also delayed folding of Opaque4 nodes to macro expansion rather than >> post loop opts igvn. The reason for that is that I believe there's a >> risk that an Opaque4 is removed (that is replaced by its input 2) >> before its input 1 has a chance to constant fold. That wouldn't happen >> with a debug build because we leave the tests in (that is replace the >> Opaque4 node by its input 1) so that corner case is not properly >> tested currently. The reason for leaving the tests in was to sanity >> check that the tests are indeed correct. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8275330 > - reviews > - fix Thanks for doing the updates, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6429 From ngasson at openjdk.java.net Fri Nov 26 08:59:04 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Fri, 26 Nov 2021 08:59:04 GMT Subject: RFR: 8277508: need to check has_predicated_vectors before calling scalable_predicate_reg_slots [v2] In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 01:15:30 GMT, Yadong Wang wrote: >> Hi, Team, >> A separate set of predicate registers is not mandatory for an implementation of scalable vectors. It will cause a failure in some platform which supports scalable vectors without explicit predicated registers, like riscv. All code about RegVectMask should be covered by has_predicated_vectors here in Matcher::init_first_stack_mask(). >> >> Yadong > > Yadong Wang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Looks ok to me. ------------- Marked as reviewed by ngasson (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6492 From yadongwang at openjdk.java.net Fri Nov 26 09:13:07 2021 From: yadongwang at openjdk.java.net (Yadong Wang) Date: Fri, 26 Nov 2021 09:13:07 GMT Subject: Integrated: 8277508: need to check has_predicated_vectors before calling scalable_predicate_reg_slots In-Reply-To: References: Message-ID: On Sun, 21 Nov 2021 13:32:18 GMT, Yadong Wang wrote: > Hi, Team, > A separate set of predicate registers is not mandatory for an implementation of scalable vectors. It will cause a failure in some platform which supports scalable vectors without explicit predicated registers, like riscv. All code about RegVectMask should be covered by has_predicated_vectors here in Matcher::init_first_stack_mask(). > > Yadong This pull request has now been integrated. Changeset: 00a6238d Author: Yadong Wang Committer: Fei Yang URL: https://git.openjdk.java.net/jdk/commit/00a6238daed4a4aaa6001275ce620646cdabfeb5 Stats: 13 lines in 1 file changed: 2 ins; 0 del; 11 mod 8277508: need to check has_predicated_vectors before calling scalable_predicate_reg_slots Reviewed-by: njian, thartmann, ngasson ------------- PR: https://git.openjdk.java.net/jdk/pull/6492 From thartmann at openjdk.java.net Fri Nov 26 09:24:13 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 26 Nov 2021 09:24:13 GMT Subject: RFR: 8275330: C2: assert(n->is_Root() || n->is_Region() || n->is_Phi() || n->is_MachMerge() || def_block->dominates(block)) failed: uses must be dominated by definitions [v2] In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 17:03:38 GMT, Roland Westrelin wrote: >> This is similar to previous bugs where: >> >> - a cast/conv node captures a narrow type in a loop body because of a >> range check, >> >> - the range check is optimized out of the loop, pre/main/post loop are >> created >> >> - overunrolling causes the main loop to become unreachable (the range >> check, if still in the main loop, would fail), the cast transforms to >> top but c2 can't optimize the loop out >> >> This was fixed by adding predicates above the main loop. With this >> particular bug, the cast node is in the post loop. The fix I propose >> is to also add predicates above the post loop. There are a few >> locations in the code that cause a post loop to be added: either the >> initial post loop or some other post loops for vectorization >> support. I think the new predicates are needed in a all cases. To be >> able to add predicates at these different points in the optimization >> process, the new predicates are copied from the main loop predicates. >> >> I also delayed folding of Opaque4 nodes to macro expansion rather than >> post loop opts igvn. The reason for that is that I believe there's a >> risk that an Opaque4 is removed (that is replaced by its input 2) >> before its input 1 has a chance to constant fold. That wouldn't happen >> with a debug build because we leave the tests in (that is replace the >> Opaque4 node by its input 1) so that corner case is not properly >> tested currently. The reason for leaving the tests in was to sanity >> check that the tests are indeed correct. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8275330 > - reviews > - fix Looks good. All testing passed. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6429 From neliasso at openjdk.java.net Fri Nov 26 09:43:32 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 26 Nov 2021 09:43:32 GMT Subject: RFR: JDK-8264838: IGV: enhance graph export functionality [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch adds SVG and searchable PDF export functionality to IGV. > > It's originally contributed by rcastanedalo at openjdk.java.net. > I have updated the patch with new library versions, rebased and tested it. > > Please review, > Nils Eliasson Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: Clean up ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6564/files - new: https://git.openjdk.java.net/jdk/pull/6564/files/607a5d19..14378ddc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6564&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6564&range=00-01 Stats: 12 lines in 2 files changed: 2 ins; 8 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6564.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6564/head:pull/6564 PR: https://git.openjdk.java.net/jdk/pull/6564 From neliasso at openjdk.java.net Fri Nov 26 09:48:03 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 26 Nov 2021 09:48:03 GMT Subject: RFR: JDK-8264838: IGV: enhance graph export functionality [v2] In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 09:43:32 GMT, Nils Eliasson wrote: >> Hi, >> >> This patch adds SVG and searchable PDF export functionality to IGV. >> >> It's originally contributed by rcastanedalo at openjdk.java.net. >> I have updated the patch with new library versions, rebased and tested it. >> >> Please review, >> Nils Eliasson > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Clean up I updated this PR with a change based on feedback by @chhagedorn from the previous (withdrawn) PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/6564 From roland at openjdk.java.net Fri Nov 26 09:51:13 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 26 Nov 2021 09:51:13 GMT Subject: Integrated: 8275330: C2: assert(n->is_Root() || n->is_Region() || n->is_Phi() || n->is_MachMerge() || def_block->dominates(block)) failed: uses must be dominated by definitions In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 12:16:32 GMT, Roland Westrelin wrote: > This is similar to previous bugs where: > > - a cast/conv node captures a narrow type in a loop body because of a > range check, > > - the range check is optimized out of the loop, pre/main/post loop are > created > > - overunrolling causes the main loop to become unreachable (the range > check, if still in the main loop, would fail), the cast transforms to > top but c2 can't optimize the loop out > > This was fixed by adding predicates above the main loop. With this > particular bug, the cast node is in the post loop. The fix I propose > is to also add predicates above the post loop. There are a few > locations in the code that cause a post loop to be added: either the > initial post loop or some other post loops for vectorization > support. I think the new predicates are needed in a all cases. To be > able to add predicates at these different points in the optimization > process, the new predicates are copied from the main loop predicates. > > I also delayed folding of Opaque4 nodes to macro expansion rather than > post loop opts igvn. The reason for that is that I believe there's a > risk that an Opaque4 is removed (that is replaced by its input 2) > before its input 1 has a chance to constant fold. That wouldn't happen > with a debug build because we leave the tests in (that is replace the > Opaque4 node by its input 1) so that corner case is not properly > tested currently. The reason for leaving the tests in was to sanity > check that the tests are indeed correct. This pull request has now been integrated. Changeset: 3e798dd4 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/3e798dd40c68439f3220445e679b9e0e495435d8 Stats: 190 lines in 8 files changed: 135 ins; 22 del; 33 mod 8275330: C2: assert(n->is_Root() || n->is_Region() || n->is_Phi() || n->is_MachMerge() || def_block->dominates(block)) failed: uses must be dominated by definitions Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/6429 From duke at openjdk.java.net Fri Nov 26 09:55:09 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Fri, 26 Nov 2021 09:55:09 GMT Subject: Integrated: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 18:26:36 GMT, Ludvig Janiuk wrote: > Refactor PredecessorValidator, more or less applying the following: > > declare variables where used > redeclare instead of reuse variables > move assert to a more logical place > remove unused length variable > inline variables where senseful > split loops > extract methods > > this is done in preparation for work on optimizing IR::verify. IR::verify calls PredecessorValidator. If the work of PredecessorValidator is made clearer, it will be easier to reason about where IR::verify doesn't need to be called (or where a subset of it would suffice). This pull request has now been integrated. Changeset: 040b2c52 Author: Ludvig Janiuk Committer: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/040b2c52d3e82048630fbd45a7db48a5e65204b7 Stats: 75 lines in 1 file changed: 26 ins; 23 del; 26 mod 8277139: Improve code readability in PredecessorValidator (c1_IR.cpp) Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/6394 From rkennke at openjdk.java.net Fri Nov 26 10:03:21 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 26 Nov 2021 10:03:21 GMT Subject: RFR: 8277860: PPC: Remove duplicate info != NULL check Message-ID: I made a mistake: I should have fixed the duplicated info != NULL check in LIR_Assembler::emit_load_klass() in c1_LIRAssembler_ppc.cpp but forgot until the second I sent the integrate command. I have no access to PPC hardware, I am sending this blindly, relying only on GHA tests. ------------- Commit messages: - 8277860: PPC: Remove duplicate info != NULL check Changes: https://git.openjdk.java.net/jdk/pull/6571/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6571&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277860 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6571.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6571/head:pull/6571 PR: https://git.openjdk.java.net/jdk/pull/6571 From chagedorn at openjdk.java.net Fri Nov 26 10:03:22 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 26 Nov 2021 10:03:22 GMT Subject: RFR: 8277860: PPC: Remove duplicate info != NULL check In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 09:52:22 GMT, Roman Kennke wrote: > I made a mistake: I should have fixed the duplicated info != NULL check in LIR_Assembler::emit_load_klass() in c1_LIRAssembler_ppc.cpp but forgot until the second I sent the integrate command. > > I have no access to PPC hardware, I am sending this blindly, relying only on GHA tests. Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6571 From chagedorn at openjdk.java.net Fri Nov 26 10:06:06 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 26 Nov 2021 10:06:06 GMT Subject: RFR: JDK-8277139 Improve code readability in PredecessorValidator (c1_IR.cpp) [v3] In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 16:17:51 GMT, Ludvig Janiuk wrote: >> I don't have a strict opinion here but I thought the assertions belong to the `block->end()->sux_at(i)/block->exception_handler_at(i)` calls below as an additional verification of the kind of blocks. But I'm fine with both. > > My reading was that `PredecessorValidator` validates several things. Validating the flags in one concern, the other is what happens in `verify_block_preds_against_collected_preds`. And `collect_predecessors` is just necessary to make `verify_block_preds_against_collected_preds` possible. > > I can sort of imagine what you mean, but those assertions aren't necessary for a call to `block->end()->sux_at(i)`. So I'll keep it as is if it's fine by you. That's fine :-) ------------- PR: https://git.openjdk.java.net/jdk/pull/6394 From rkennke at openjdk.java.net Fri Nov 26 10:10:25 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 26 Nov 2021 10:10:25 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v6] In-Reply-To: References: Message-ID: <9SRkOBufN9mV5JC6rw8_0vx7CwkeDYlN-RPj95p5_FU=.097bfe60-5ed1-4504-b603-1529d8587b25@github.com> > The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. > I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. > > The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [ ] tier4 Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge remote-tracking branch 'origin/JDK-8276901' into JDK-8276901 - Make flag deprecation product-only; Add flag to VMDeprecatedOptions test - Merge branch 'master' into JDK-8276901 - Fix formatting - Keep UseHeavyMonitors as release flag, but deprecate it - Add run configuration using -XX:+UseHeavyMonitors to MapLoops test - Verify monitors even in non-debug builds - Change VerifyHeavyMonitors flag to diagnostic - 8276901: Implement UseHeavyMonitors consistently ------------- Changes: https://git.openjdk.java.net/jdk/pull/6320/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=05 Stats: 225 lines in 15 files changed: 75 ins; 18 del; 132 mod Patch: https://git.openjdk.java.net/jdk/pull/6320.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320 PR: https://git.openjdk.java.net/jdk/pull/6320 From roland at openjdk.java.net Fri Nov 26 10:17:31 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 26 Nov 2021 10:17:31 GMT Subject: RFR: 8275638: GraphKit::combine_exception_states fails with "matching stack sizes" assert Message-ID: Root cause is identical to 8273165 AFIU: late inline of a virtual call can throw from 2 different paths (null check and the call itself). That breaks because the logic for exceptions expects the stack for all paths that throw exceptions to have the same stack size. AFAIU, the stack doesn't matter exception handling: either the exception is caught by a exception handler and then the stack is popped and the exception is pushed or, the exception is rethrown to the caller in which case the current stack is also popped (that is the jvm state for the current method). As a consequence the fix I propose is to ignore the stack in GraphKit::combine_exception_states(). AFAIU, the same fix would work for 8273165 but I left the current work around as is: not sure if we want to be conservative for now or not ------------- Commit messages: - more - fix Changes: https://git.openjdk.java.net/jdk/pull/6572/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6572&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275638 Stats: 85 lines in 2 files changed: 81 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6572/head:pull/6572 PR: https://git.openjdk.java.net/jdk/pull/6572 From chagedorn at openjdk.java.net Fri Nov 26 10:27:13 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 26 Nov 2021 10:27:13 GMT Subject: RFR: JDK-8264838: IGV: enhance graph export functionality [v2] In-Reply-To: References: Message-ID: <4bXRYU3-1-T_HZRUa6boN-C5Q06htxlWDxLkwT3wGLI=.e7898afe-ff70-48f4-8502-dbbcc0c3bdc4@github.com> On Fri, 26 Nov 2021 09:43:32 GMT, Nils Eliasson wrote: >> Hi, >> >> This patch adds SVG and searchable PDF export functionality to IGV. >> >> It's originally contributed by rcastanedalo at openjdk.java.net. >> I have updated the patch with new library versions, rebased and tested it. >> >> Please review, >> Nils Eliasson > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Clean up Looks good to me! I've applied the changes and could successfully export a graph to PDF and SVG. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6564 From roland at openjdk.java.net Fri Nov 26 10:27:48 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 26 Nov 2021 10:27:48 GMT Subject: RFR: 8275638: GraphKit::combine_exception_states fails with "matching stack sizes" assert [v2] In-Reply-To: References: Message-ID: > Root cause is identical to 8273165 AFIU: late inline of a virtual call > can throw from 2 different paths (null check and the call > itself). That breaks because the logic for exceptions expects the > stack for all paths that throw exceptions to have the same stack size. > > AFAIU, the stack doesn't matter exception handling: either the > exception is caught by a exception handler and then the stack is > popped and the exception is pushed or, the exception is rethrown to > the caller in which case the current stack is also popped (that is the > jvm state for the current method). As a consequence the fix I propose > is to ignore the stack in GraphKit::combine_exception_states(). > > AFAIU, the same fix would work for 8273165 but I left the current work > around as is: not sure if we want to be conservative for now or not Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: make test runnable with release build ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6572/files - new: https://git.openjdk.java.net/jdk/pull/6572/files/f843e27e..e3a04acd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6572&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6572&range=00-01 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6572/head:pull/6572 PR: https://git.openjdk.java.net/jdk/pull/6572 From roland at openjdk.java.net Fri Nov 26 10:35:03 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 26 Nov 2021 10:35:03 GMT Subject: RFR: 8277842: IGV: Add jvms property to know where a node came from In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 14:46:36 GMT, Christian Hagedorn wrote: > When dumping a node with `node->dump()`, it also prints the JVM state which tells us to which bci and inlinee method the node belongs to: > > 38 StoreI === 5 7 37 35 [[ 15 ]] @java/lang/Class:exact+116 *, name=y, idx=5; Memory: @java/lang/Class:exact+116 *, name=y, idx=5; !jvms: Test::inlinee @ bci:1 (line 16) Test::test @ bci:4 (line 12) > > IGV only shows the line and bci information with which it is sometimes hard to tell where exactly the node came from, especially with deep inlining. This patch adds the entire JVM state as a `jvms` property field to IGV: > > ![Screenshot from 2021-11-25 15-21-53](https://user-images.githubusercontent.com/17833009/143460385-baf5ee3a-31b0-4693-bfa4-0de91c3c4822.png) > > This helps to better analyze a graph. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6563 From neliasso at openjdk.java.net Fri Nov 26 10:42:05 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 26 Nov 2021 10:42:05 GMT Subject: RFR: 8277842: IGV: Add jvms property to know where a node came from In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 14:46:36 GMT, Christian Hagedorn wrote: > When dumping a node with `node->dump()`, it also prints the JVM state which tells us to which bci and inlinee method the node belongs to: > > 38 StoreI === 5 7 37 35 [[ 15 ]] @java/lang/Class:exact+116 *, name=y, idx=5; Memory: @java/lang/Class:exact+116 *, name=y, idx=5; !jvms: Test::inlinee @ bci:1 (line 16) Test::test @ bci:4 (line 12) > > IGV only shows the line and bci information with which it is sometimes hard to tell where exactly the node came from, especially with deep inlining. This patch adds the entire JVM state as a `jvms` property field to IGV: > > ![Screenshot from 2021-11-25 15-21-53](https://user-images.githubusercontent.com/17833009/143460385-baf5ee3a-31b0-4693-bfa4-0de91c3c4822.png) > > This helps to better analyze a graph. > > Thanks, > Christian Approved! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6563 From tobias.hartmann at oracle.com Fri Nov 26 10:45:13 2021 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 26 Nov 2021 11:45:13 +0100 Subject: Queries on AVX512 support in Hotspot In-Reply-To: References: Message-ID: <72dc262d-a01b-deba-4b16-91096e79fa02@oracle.com> Hi Rahul, On 22.11.21 09:49, Rahul wrote: > Request help with questions on AVX512 support in Hotspot. > Please note I am trying to find existing AVX512 support in hotspot. Answers to below questions really depend on what you mean by "AVX512 support". I assume you mean support for AVX512 instructions in C2's (vector) optimizations? Or do you specifically mean the Vector API (https://openjdk.java.net/jeps/417)? > Understood that the support started with JDK-8076276 enhancement. > When compared with instruction set manuals it seems full AVX512 > instructions are not supported for now. > (e.g.: AVX512_IFMA, AVX512_BF16 set instructions etc. seems not supported > Also though feature CPU_AVX512F, AVX512PF etc. feature set is enabled, > again it seems all instructions in the set may not be supported.) Right, C2 optimizations only require/support a subset of all the AVX512 instructions but that's the same for SSE and other vector instruction sets. It's because C2's superword optimization only supports vectorizing some operations and we therefore only need the corresponding vector instructions. Vector instructions are also used at other places. For example, for arraycopy or for zeroing (see https://bugs.openjdk.java.net/browse/JDK-8251871). > So is the existing support added so far documented somewhere? I don't think there is any documentation other than the code and the JBS entries: https://bugs.openjdk.java.net/issues/?jql=component%20%3D%20hotspot%20and%20type%20in%20(Enhancement)%20and%20(text%20~%20AVX512%20or%20text%20~%20%22AVX-512%22%20or%20text%20~%20%22AVX%20512%22) You could also include bugs: https://bugs.openjdk.java.net/issues/?jql=component%20%3D%20hotspot%20and%20type%20in%20(Enhancement%2C%20bug)%20and%20(text%20~%20AVX512%20or%20text%20~%20%22AVX-512%22%20or%20text%20~%20%22AVX%20512%22) > Also any details of any ongoing, future plans to add remaining AVX-512 > support? Again, you could look at above JBS issues that are still unresolved. > Aslo trying to check available jtreg tests, benchmarks related to AVX-512. > Is the main related tests located at - > test/hotspot/jtreg/compiler/loopopts/superword/ ? > (Also found > test/hotspot/jtreg/compiler/loopopts/superword/TestArrayCopyConjoint.java, > TestArrayCopyDisjoint.java tests) Yes, these are the tests for C2's superword optimization that vectorizes loops and makes use of AVX instructions when possible. But there are other tests that exercise this (for example, all the arraycopy tests). We also have many microbenchmarks that exercise vector instructions. For example, test/micro/org/openjdk/bench/java/lang/ArrayCopyObject.java. > Are there any other functional or unit tests to check exact AVX > instructions generated. I don't think we have any tests that check that/which vector instructions are emitted. > For example, to catch situations like AVX2 instructions getting wrongly > generated instead of available/expected AVX3 instructions !?. > Request guidance with existing AVX-512 support tests, benchmarks. Hope that helps. The Intel folks might be able to add more details. Best regards, Tobias From chagedorn at openjdk.java.net Fri Nov 26 10:50:03 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 26 Nov 2021 10:50:03 GMT Subject: RFR: 8277842: IGV: Add jvms property to know where a node came from In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 14:46:36 GMT, Christian Hagedorn wrote: > When dumping a node with `node->dump()`, it also prints the JVM state which tells us to which bci and inlinee method the node belongs to: > > 38 StoreI === 5 7 37 35 [[ 15 ]] @java/lang/Class:exact+116 *, name=y, idx=5; Memory: @java/lang/Class:exact+116 *, name=y, idx=5; !jvms: Test::inlinee @ bci:1 (line 16) Test::test @ bci:4 (line 12) > > IGV only shows the line and bci information with which it is sometimes hard to tell where exactly the node came from, especially with deep inlining. This patch adds the entire JVM state as a `jvms` property field to IGV: > > ![Screenshot from 2021-11-25 15-21-53](https://user-images.githubusercontent.com/17833009/143460385-baf5ee3a-31b0-4693-bfa4-0de91c3c4822.png) > > This helps to better analyze a graph. > > Thanks, > Christian Thanks Roland and Nils for your reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/6563 From phedlin at openjdk.java.net Fri Nov 26 11:11:00 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Fri, 26 Nov 2021 11:11:00 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend [v2] In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: On Wed, 24 Nov 2021 11:52:45 GMT, Patric Hedlin wrote: >> C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). >> >> Contributed by Nick Gasson. >> >> Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). >> >> Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. > > Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: > > Clean-up address calculation via use of legitimize_address(). As suspected, there are issues around initialisation as well as direct use of immediate offsets with scaling. I have filed the following TR: https://bugs.openjdk.java.net/browse/JDK-8277862, to track the issue. I'm letting this change stand as is. ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From mdoerr at openjdk.java.net Fri Nov 26 11:36:06 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 26 Nov 2021 11:36:06 GMT Subject: RFR: 8277860: PPC: Remove duplicate info != NULL check In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 09:52:22 GMT, Roman Kennke wrote: > I made a mistake: I should have fixed the duplicated info != NULL check in LIR_Assembler::emit_load_klass() in c1_LIRAssembler_ppc.cpp but forgot until the second I sent the integrate command. > > I have no access to PPC hardware, I am sending this blindly, relying only on GHA tests. Thanks for fixing it! I've given it a quick spin on real hardware. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6571 From neliasso at openjdk.java.net Fri Nov 26 13:34:03 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 26 Nov 2021 13:34:03 GMT Subject: RFR: 8277411: C2 fast_unlock intrinsic on AArch64 has unnecessary ownership check In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 11:04:37 GMT, Erik ?sterlund wrote: > The AArch64 fast_unlock C2 code checks if the current thread owns the lock. This can be surprisingly expensive in workload where locking is contended. The check is however optional (helpful only for finding JNI code bugs), and indeed not emitted for x86_64. This patch removes the check on AArch64 as well. Approved! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6498 From jbhateja at openjdk.java.net Fri Nov 26 13:51:05 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 26 Nov 2021 13:51:05 GMT Subject: RFR: 8277793: Support vector F2I and D2L cast operations for X86 In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 08:18:40 GMT, Nils Eliasson wrote: > Is this change already covered by tests and micros? Hi @neliasso , Yes we do have existing tests and micro which exercises these change. MICRO : test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java TEST: test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java BR Jatin ------------- PR: https://git.openjdk.java.net/jdk/pull/6544 From roland at openjdk.java.net Fri Nov 26 14:42:24 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 26 Nov 2021 14:42:24 GMT Subject: RFR: 8276116: C2: optimize long range checks in int counted loops Message-ID: <8bvd-Dtu9tKQVPWEV5lo0Xa7H2X76uVsgf1l6vKm7CM=.836388f2-a117-4f5d-9385-1690c7d0fd74@github.com> Maurizio noticed that some of his panama micro benchmarks don't perform better avec 8259609 (C2: optimize long range checks in long counted loops). The reason is that 8259609 optimizes long range checks in long counted loops but some of his benchmarks include long range checks in int counted loops: for (int i = start; i < stop; i += inc) { Objects.checkIndex(scale * ((long)i) + offset, length); } This change applies the transformation from 8259609 for long counted loop/long range checks to int counted loop/long range checks. That includes creating a loop nest and transforming the long range check to an int range check that's subject to range elimination in the inner loop. The reason it's required to create a loop nest is that the long range check transformation logic depends on no overflow of scale * i for the range of values that the transformed range check is applied to. As a consequence, this change is mostly refactoring to make the loop nest creation and range check transformation parameterized by the type of the transformed loop. I think this transformation needs to be applied as late as possible but, in the case of an int counted loop, before pre/main/post loops are created. I had to move it to IdealLoopTree::iteration_split_impl() because of that. There's an alternate shape for a long range check in an int counted loop that Maurizio insisted needs to be supported: for (int i = start; i < stop; i += inc) { Objects.checkIndex(((long)(scale * i)) + offset, length); } scale * i can overflow in that case. This is also supported but as a corner case of the previous one. The code in PhaseIdealLoop::transform_long_range_checks() has a comment about that. Note also that this transformation works best if loop strip mining is enabled (that is for G1, ZGC, Shenandoah by default). The reason is that it needs a safepoint and when loop strip mining is enabled, the outer loop contains one that's always available. A way to have this work as well for all GCs would be to always construct the loop strip mining loop nest (whether loop strip mining is enabled or not) and then only once loop opts are over remove the outer loop when loop strip mining is disabled. I'm looking for feedback on this. BTW, something doesn't seem right in IdealLoopTree::iteration_split_impl(): https://github.com/rwestrel/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L3475 should_peel causes transformations to be skipped but peeling is never applied AFAICT. Does it make sense to anyone? ------------- Commit messages: - whitespaces - test & fix Changes: https://git.openjdk.java.net/jdk/pull/6576/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6576&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276116 Stats: 609 lines in 7 files changed: 510 ins; 18 del; 81 mod Patch: https://git.openjdk.java.net/jdk/pull/6576.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6576/head:pull/6576 PR: https://git.openjdk.java.net/jdk/pull/6576 From eosterlund at openjdk.java.net Fri Nov 26 14:58:09 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 26 Nov 2021 14:58:09 GMT Subject: RFR: 8277411: C2 fast_unlock intrinsic on AArch64 has unnecessary ownership check In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 13:30:54 GMT, Nils Eliasson wrote: >> The AArch64 fast_unlock C2 code checks if the current thread owns the lock. This can be surprisingly expensive in workload where locking is contended. The check is however optional (helpful only for finding JNI code bugs), and indeed not emitted for x86_64. This patch removes the check on AArch64 as well. > > Approved! Thanks for the reviews @neliasso and @nick-arm! ------------- PR: https://git.openjdk.java.net/jdk/pull/6498 From rkennke at openjdk.java.net Fri Nov 26 14:57:11 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 26 Nov 2021 14:57:11 GMT Subject: Integrated: 8277860: PPC: Remove duplicate info != NULL check In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 09:52:22 GMT, Roman Kennke wrote: > I made a mistake: I should have fixed the duplicated info != NULL check in LIR_Assembler::emit_load_klass() in c1_LIRAssembler_ppc.cpp but forgot until the second I sent the integrate command. > > I have no access to PPC hardware, I am sending this blindly, relying only on GHA tests. This pull request has now been integrated. Changeset: ce0234b4 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/ce0234b47d5c40e74dac368396e92cdec5cc2de7 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod 8277860: PPC: Remove duplicate info != NULL check Reviewed-by: chagedorn, mdoerr ------------- PR: https://git.openjdk.java.net/jdk/pull/6571 From eosterlund at openjdk.java.net Fri Nov 26 14:58:10 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 26 Nov 2021 14:58:10 GMT Subject: Integrated: 8277411: C2 fast_unlock intrinsic on AArch64 has unnecessary ownership check In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 11:04:37 GMT, Erik ?sterlund wrote: > The AArch64 fast_unlock C2 code checks if the current thread owns the lock. This can be surprisingly expensive in workload where locking is contended. The check is however optional (helpful only for finding JNI code bugs), and indeed not emitted for x86_64. This patch removes the check on AArch64 as well. This pull request has now been integrated. Changeset: 3d810ad6 Author: Erik ?sterlund URL: https://git.openjdk.java.net/jdk/commit/3d810ad6912b7bca03e212b604cf60412da11c18 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod 8277411: C2 fast_unlock intrinsic on AArch64 has unnecessary ownership check Reviewed-by: ngasson, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/6498 From aph at openjdk.java.net Fri Nov 26 15:56:07 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 26 Nov 2021 15:56:07 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend [v2] In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: On Wed, 24 Nov 2021 11:52:45 GMT, Patric Hedlin wrote: >> C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). >> >> Contributed by Nick Gasson. >> >> Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). >> >> Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. > > Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: > > Clean-up address calculation via use of legitimize_address(). Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From duke at openjdk.java.net Sat Nov 27 03:14:06 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Sat, 27 Nov 2021 03:14:06 GMT Subject: RFR: 8277793: Support vector F2I and D2L cast operations for X86 In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 18:50:17 GMT, Jatin Bhateja wrote: > - JDK-8275317 extended auto-vectorizer to infer Vector Cast operations if source and destination primitive type have same size. > - This patch adds the backend support for vector CastF2I and CaseD2L on X86 AVX512 and legacy targets. > > Following are the performance measurements of an existing JMH benchmark (test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) > > System Configuration : Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server) > > BENCHMARK | SIZE | BASELINE (AVX3) ns/op | WithOpt (AVX3) ns/op | Gain AVX3(baseline/opt) | BASELINE (AVX2) ns/op | WithOpt (AVX2) ns/op | Gain AVX2 (baseline/opt) > -- | -- | -- | -- | -- | -- | -- | -- > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 512.00 | 256.26 | 77.50 | 3.31 | 275.49 | 275.65 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 1024.00 | 501.87 | 150.35 | 3.34 | 540.47 | 541.22 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 2048.00 | 993.05 | 293.23 | 3.39 | 1070.56 | 1070.14 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 512.00 | 227.83 | 39.36 | 5.79 | 248.25 | 45.01 | 5.52 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 1024.00 | 449.70 | 77.88 | 5.77 | 487.33 | 86.15 | 5.66 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 2048.00 | 884.95 | 149.58 | 5.92 | 956.58 | 152.45 | 6.27 > > Kindly review and share your feedback. > > Best Regards, > Jatin src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4086: > 4084: evblendmpd(dst, ktmp1, dst, xtmp2, true, vec_enc); > 4085: > 4086: evpcmpeqq(ktmp1, xtmp1, dst, vec_enc); Hi, some tiny suggestions here, By using an extra `KRegister`, we can xor the previous masks to obtain this, eliminate the need of an `evpcmpeqq` here. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4088: > 4086: evpcmpeqq(ktmp1, xtmp1, dst, vec_enc); > 4087: evcmppd(ktmp1, ktmp1, src, xtmp2, Assembler::NLT_US, vec_enc); > 4088: evmovdquq(xtmp1, max_long, vec_enc, scratch); `max_long` can be obtained from `double_sign_flip` with a bitwise not, which in turn can be achieved using `vpternlog` instruction. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4101: > 4099: vpcmpeqd(xtmp2, dst, xtmp1, vec_enc); > 4100: vpmovmskb(scratch, xtmp2, vec_enc); > 4101: testl(scratch, scratch); Is a single `vptest` sufficient here? ------------- PR: https://git.openjdk.java.net/jdk/pull/6544 From duke at openjdk.java.net Sat Nov 27 03:19:07 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Sat, 27 Nov 2021 03:19:07 GMT Subject: RFR: 8277793: Support vector F2I and D2L cast operations for X86 In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 18:50:17 GMT, Jatin Bhateja wrote: > - JDK-8275317 extended auto-vectorizer to infer Vector Cast operations if source and destination primitive type have same size. > - This patch adds the backend support for vector CastF2I and CaseD2L on X86 AVX512 and legacy targets. > > Following are the performance measurements of an existing JMH benchmark (test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) > > System Configuration : Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server) > > BENCHMARK | SIZE | BASELINE (AVX3) ns/op | WithOpt (AVX3) ns/op | Gain AVX3(baseline/opt) | BASELINE (AVX2) ns/op | WithOpt (AVX2) ns/op | Gain AVX2 (baseline/opt) > -- | -- | -- | -- | -- | -- | -- | -- > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 512.00 | 256.26 | 77.50 | 3.31 | 275.49 | 275.65 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 1024.00 | 501.87 | 150.35 | 3.34 | 540.47 | 541.22 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 2048.00 | 993.05 | 293.23 | 3.39 | 1070.56 | 1070.14 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 512.00 | 227.83 | 39.36 | 5.79 | 248.25 | 45.01 | 5.52 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 1024.00 | 449.70 | 77.88 | 5.77 | 487.33 | 86.15 | 5.66 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 2048.00 | 884.95 | 149.58 | 5.92 | 956.58 | 152.45 | 6.27 > > Kindly review and share your feedback. > > Best Regards, > Jatin src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4111: > 4109: vpand(xtmp2, xtmp1, src, vec_enc); > 4110: vpxor(xtmp3, xtmp2, xtmp1, vec_enc); > 4111: vpand(xtmp3, xtmp3, xtmp1, vec_enc); I think we don't need this `vpand`, since `xtmp2` is a subset of `xtmp1`, which leads to `xtmp3` being a subset of `xtmp1`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6544 From simonis at openjdk.java.net Sat Nov 27 10:00:28 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Sat, 27 Nov 2021 10:00:28 GMT Subject: RFR: 8277878: Fix compiler tests after JDK-8275908 Message-ID: This is a quick fix for the two compiler tests introduced by JDK-8275908. The test explicitly added SerialGC in the parameter list which leads to a garbage collector conflict, if the tests are run with some other Garbage collector. It was suggested to fix this by adding a `@requires vm.gc.Serial` tag but this is not necessary because the tests are actually GC-agnostic so I've removed the `-XX:+UseSerialGC` parameter from the test command line instead. After the fix it was necessary to refine the check for whether the test JVM has JVMCI support built-in. Before the fix, I used `WhiteBox.getWhiteBox().isJVMCISupportedByGC()` which worked fine if only running with SerialGC. Now that we can run with GCs which don't support JVMCI we have to use the more specific `(WB.getBooleanVMFlag("EnableJVMCI") != null)`. Please let me know if you want me to push this instantly after it has been reviewed or if you first want to re-run your internal Tier3 tests before pushing. ------------- Commit messages: - 8277878: Fix compiler tests after JDK-8275908 Changes: https://git.openjdk.java.net/jdk/pull/6581/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6581&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277878 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6581.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6581/head:pull/6581 PR: https://git.openjdk.java.net/jdk/pull/6581 From dlong at openjdk.java.net Sat Nov 27 23:36:13 2021 From: dlong at openjdk.java.net (Dean Long) Date: Sat, 27 Nov 2021 23:36:13 GMT Subject: RFR: 8275638: GraphKit::combine_exception_states fails with "matching stack sizes" assert [v2] In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 10:27:48 GMT, Roland Westrelin wrote: >> Root cause is identical to 8273165 AFIU: late inline of a virtual call >> can throw from 2 different paths (null check and the call >> itself). That breaks because the logic for exceptions expects the >> stack for all paths that throw exceptions to have the same stack size. >> >> AFAIU, the stack doesn't matter exception handling: either the >> exception is caught by a exception handler and then the stack is >> popped and the exception is pushed or, the exception is rethrown to >> the caller in which case the current stack is also popped (that is the >> jvm state for the current method). As a consequence the fix I propose >> is to ignore the stack in GraphKit::combine_exception_states(). >> >> AFAIU, the same fix would work for 8273165 but I left the current work >> around as is: not sure if we want to be conservative for now or not > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > make test runnable with release build The existing code is confusing to me. Are these exception states only used when we are actually throwing an exception, or can do they affect the state when deoptimizing? Why does GraphKit::add_exception_state() have a comment about arguments and compares stacks? Why doesn't GraphKit::make_exception_state() just reset the stack using push_ex_oop() instead of preserving the stack wtih set_saved_ex_oop()? (The names for these two functions are terrible, because "push_ex_oop" resets the stack first and "set_saved_ex_oop" does not...) ------------- PR: https://git.openjdk.java.net/jdk/pull/6572 From dlong at openjdk.java.net Sun Nov 28 11:33:07 2021 From: dlong at openjdk.java.net (Dean Long) Date: Sun, 28 Nov 2021 11:33:07 GMT Subject: RFR: 8275638: GraphKit::combine_exception_states fails with "matching stack sizes" assert [v2] In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 10:27:48 GMT, Roland Westrelin wrote: >> Root cause is identical to 8273165 AFIU: late inline of a virtual call >> can throw from 2 different paths (null check and the call >> itself). That breaks because the logic for exceptions expects the >> stack for all paths that throw exceptions to have the same stack size. >> >> AFAIU, the stack doesn't matter exception handling: either the >> exception is caught by a exception handler and then the stack is >> popped and the exception is pushed or, the exception is rethrown to >> the caller in which case the current stack is also popped (that is the >> jvm state for the current method). As a consequence the fix I propose >> is to ignore the stack in GraphKit::combine_exception_states(). >> >> AFAIU, the same fix would work for 8273165 but I left the current work >> around as is: not sure if we want to be conservative for now or not > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > make test runnable with release build I tried calling set_sp(0); clean_stack(0); in GraphKit::make_exception_state() and it seems to work, but I still don't understand why the stack size mismatch only shows up with late inlining. ------------- PR: https://git.openjdk.java.net/jdk/pull/6572 From jbhateja at openjdk.java.net Sun Nov 28 18:37:40 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 28 Nov 2021 18:37:40 GMT Subject: RFR: 8277793: Support vector F2I and D2L cast operations for X86 [v2] In-Reply-To: References: Message-ID: > - JDK-8275317 extended auto-vectorizer to infer Vector Cast operations if source and destination primitive type have same size. > - This patch adds the backend support for vector CastF2I and CaseD2L on X86 AVX512 and legacy targets. > > Following are the performance measurements of an existing JMH benchmark (test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) > > System Configuration : Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server) > > BENCHMARK | SIZE | BASELINE (AVX3) ns/op | WithOpt (AVX3) ns/op | Gain AVX3(baseline/opt) | BASELINE (AVX2) ns/op | WithOpt (AVX2) ns/op | Gain AVX2 (baseline/opt) > -- | -- | -- | -- | -- | -- | -- | -- > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 512.00 | 256.26 | 77.50 | 3.31 | 275.49 | 275.65 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 1024.00 | 501.87 | 150.35 | 3.34 | 540.47 | 541.22 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 2048.00 | 993.05 | 293.23 | 3.39 | 1070.56 | 1070.14 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 512.00 | 227.83 | 39.36 | 5.79 | 248.25 | 45.01 | 5.52 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 1024.00 | 449.70 | 77.88 | 5.77 | 487.33 | 86.15 | 5.66 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 2048.00 | 884.95 | 149.58 | 5.92 | 956.58 | 152.45 | 6.27 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8277793: Further optimizing instruction sequence. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6544/files - new: https://git.openjdk.java.net/jdk/pull/6544/files/1d264ced..d01d938a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6544&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6544&range=00-01 Stats: 55 lines in 3 files changed: 7 ins; 8 del; 40 mod Patch: https://git.openjdk.java.net/jdk/pull/6544.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6544/head:pull/6544 PR: https://git.openjdk.java.net/jdk/pull/6544 From jbhateja at openjdk.java.net Sun Nov 28 18:37:41 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 28 Nov 2021 18:37:41 GMT Subject: RFR: 8277793: Support vector F2I and D2L cast operations for X86 In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 18:50:17 GMT, Jatin Bhateja wrote: > - JDK-8275317 extended auto-vectorizer to infer Vector Cast operations if source and destination primitive type have same size. > - This patch adds the backend support for vector CastF2I and CaseD2L on X86 AVX512 and legacy targets. > > Following are the performance measurements of an existing JMH benchmark (test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) > > System Configuration : Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server) > > BENCHMARK | SIZE | BASELINE (AVX3) ns/op | WithOpt (AVX3) ns/op | Gain AVX3(baseline/opt) | BASELINE (AVX2) ns/op | WithOpt (AVX2) ns/op | Gain AVX2 (baseline/opt) > -- | -- | -- | -- | -- | -- | -- | -- > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 512.00 | 256.26 | 77.50 | 3.31 | 275.49 | 275.65 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 1024.00 | 501.87 | 150.35 | 3.34 | 540.47 | 541.22 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_d2l | 2048.00 | 993.05 | 293.23 | 3.39 | 1070.56 | 1070.14 | 1.00 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 512.00 | 227.83 | 39.36 | 5.79 | 248.25 | 45.01 | 5.52 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 1024.00 | 449.70 | 77.88 | 5.77 | 487.33 | 86.15 | 5.66 > TypeVectorOperations.TypeVectorOperationsSuperWord.convert_f2i | 2048.00 | 884.95 | 149.58 | 5.92 | 956.58 | 152.45 | 6.27 > > Kindly review and share your feedback. > > Best Regards, > Jatin Hi @merykitty , thanks I have optimized the sequence futher. ------------- PR: https://git.openjdk.java.net/jdk/pull/6544 From jbhateja at openjdk.java.net Sun Nov 28 18:44:08 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 28 Nov 2021 18:44:08 GMT Subject: RFR: 8277793: Support vector F2I and D2L cast operations for X86 [v2] In-Reply-To: References: Message-ID: On Sat, 27 Nov 2021 03:09:30 GMT, Mai ??ng Qu?n Anh wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8277793: Further optimizing instruction sequence. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4086: > >> 4084: evblendmpd(dst, ktmp1, dst, xtmp2, true, vec_enc); >> 4085: >> 4086: evpcmpeqq(ktmp1, xtmp1, dst, vec_enc); > > Hi, some tiny suggestions here, > > By using an extra `KRegister`, we can xor the previous masks to obtain this, eliminate the need of an `evpcmpeqq` here. Yes, nice suggestions, AVX512 is a very powerful instruction set and opmask register allocation and our recent masking support contributions are major infrastructure enhancements which will enable developers to emit optimized AVX512 instruction sequence and harness its true potential. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4101: > >> 4099: vpcmpeqd(xtmp2, dst, xtmp1, vec_enc); >> 4100: vpmovmskb(scratch, xtmp2, vec_enc); >> 4101: testl(scratch, scratch); > > Is a single `vptest` sufficient here? DONE ------------- PR: https://git.openjdk.java.net/jdk/pull/6544 From kvn at openjdk.java.net Sun Nov 28 23:30:10 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 28 Nov 2021 23:30:10 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v3] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 09:55:25 GMT, Christian Hagedorn wrote: >> In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 >> ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) >> >> In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 >> >> During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 >> >> But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. >> >> I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. >> >> I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. >> >> I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - New fix > - Merge branch 'master' into JDK-8275326 > - handle GVN > - C2: assert(no_dead_loop) failed: dead loop detected src/hotspot/share/opto/cfgnode.cpp line 2315: > 2313: } > 2314: > 2315: if (igvn) { This whole split through `mergemem` part of code is guarded by `can_reshape` check at line #2196 which is `true` only during IGVN (`PhaseIterGVN`). So you don't need to worry about executing this in parser. But checking `igvn` value is fine here since you use it as we do at line #2278. ------------- PR: https://git.openjdk.java.net/jdk/pull/6276 From kvn at openjdk.java.net Sun Nov 28 23:42:12 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 28 Nov 2021 23:42:12 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v3] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 09:55:25 GMT, Christian Hagedorn wrote: >> In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 >> ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) >> >> In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 >> >> During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 >> >> But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. >> >> I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. >> >> I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. >> >> I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - New fix > - Merge branch 'master' into JDK-8275326 > - handle GVN > - C2: assert(no_dead_loop) failed: dead loop detected src/hotspot/share/opto/cfgnode.cpp line 2321: > 2319: // visiting it in the transformations below. > 2320: igvn->replace_node(this, result); > 2321: igvn->set_type(result, result->bottom_type()); Did you consider simply cut off Phi's inputs with `top` by using `replace_input_of()`? ------------- PR: https://git.openjdk.java.net/jdk/pull/6276 From duke at openjdk.java.net Mon Nov 29 03:40:35 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Mon, 29 Nov 2021 03:40:35 GMT Subject: RFR: 8277882: New subnode ideal optimization: converting "c0 - (x + c1)" into "(c0 - c1) - x" In-Reply-To: References: Message-ID: <4LMC68ZlylJEIhPe2wzMHV8c5RkQzjWBzssbippgKcc=.7254ab4b-6477-42c4-abd8-977fbde4fe5f@github.com> On Wed, 17 Nov 2021 22:50:41 GMT, Zang, Zhiqiang wrote: > Suggest two new optimizations that can be done in SubINode::Ideal. Hi, Regarding your first transformation, `x - (0 - y)` is already transformed into `(x + y) - 0` which is then simplified to `x + y`, see line 265. Consider merging your second transformation with the one in line 243, too. Cheers. You should provide a microbenchmark to prove the effectiveness of the transformations, too. ------------- PR: https://git.openjdk.java.net/jdk/pull/6441 From neliasso at openjdk.java.net Mon Nov 29 03:40:38 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 29 Nov 2021 03:40:38 GMT Subject: RFR: 8277882: New subnode ideal optimization: converting "c0 - (x + c1)" into "(c0 - c1) - x" In-Reply-To: References: <4LMC68ZlylJEIhPe2wzMHV8c5RkQzjWBzssbippgKcc=.7254ab4b-6477-42c4-abd8-977fbde4fe5f@github.com> Message-ID: On Wed, 24 Nov 2021 23:53:14 GMT, Zang, Zhiqiang wrote: >> You should provide a microbenchmark to prove the effectiveness of the transformations, too. > > @merykitty Thank you for your comments. I made the merging and included both jtreg and microbenchmark tests. Please let me know if you have any other comments. Hi @CptGit, You need to open a enhancement-bug at https://bugs.openjdk.java.net/ for this PR. The title of this PR must then be changed to "bug-number: bug-title" where the title matches the title of the bug. Regards, Nils ------------- PR: https://git.openjdk.java.net/jdk/pull/6441 From fmatte at openjdk.java.net Mon Nov 29 03:40:39 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Mon, 29 Nov 2021 03:40:39 GMT Subject: RFR: 8277882: New subnode ideal optimization: converting "c0 - (x + c1)" into "(c0 - c1) - x" In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 22:50:41 GMT, Zang, Zhiqiang wrote: > Suggest two new optimizations that can be done in SubINode::Ideal. Associated JBS issue - https://bugs.openjdk.java.net/browse/JDK-8277882 ------------- PR: https://git.openjdk.java.net/jdk/pull/6441 From thartmann at openjdk.java.net Mon Nov 29 06:42:04 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 29 Nov 2021 06:42:04 GMT Subject: RFR: JDK-8277382 make c1 BlockMerger use IR::verify only when necessary [v2] In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 12:52:32 GMT, Ludvig Janiuk wrote: >> This PR removes two calls to `IR::verify` which were unnecessary. The reason they are unnecessary is that `try_merge` does not always take any action. There is not need to verify if nothing has changed. In the cases that `try_merge` does do anything, it already calls `IR::verify` afterwards. >> >> This PR also switches some deeply nested if statements in `try_merge` to early returns. > > Ludvig Janiuk has updated the pull request incrementally with one additional commit since the last revision: > > indentation error fixed Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6456 From duke at openjdk.java.net Mon Nov 29 06:45:09 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Mon, 29 Nov 2021 06:45:09 GMT Subject: Integrated: JDK-8277382 make c1 BlockMerger use IR::verify only when necessary In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 13:30:02 GMT, Ludvig Janiuk wrote: > This PR removes two calls to `IR::verify` which were unnecessary. The reason they are unnecessary is that `try_merge` does not always take any action. There is not need to verify if nothing has changed. In the cases that `try_merge` does do anything, it already calls `IR::verify` afterwards. > > This PR also switches some deeply nested if statements in `try_merge` to early returns. This pull request has now been integrated. Changeset: c3a7f2f4 Author: Ludvig Janiuk Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/c3a7f2f4bce9170c1630e01eebd4fcd174b44964 Stats: 140 lines in 1 file changed: 24 ins; 29 del; 87 mod 8277382: make c1 BlockMerger use IR::verify only when necessary Reviewed-by: thartmann, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/6456 From thartmann at openjdk.java.net Mon Nov 29 06:52:04 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 29 Nov 2021 06:52:04 GMT Subject: RFR: JDK-8264838: IGV: enhance graph export functionality [v2] In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 09:43:32 GMT, Nils Eliasson wrote: >> Hi, >> >> This patch adds SVG and searchable PDF export functionality to IGV. >> >> It's originally contributed by rcastanedalo at openjdk.java.net. >> I have updated the patch with new library versions, rebased and tested it. >> >> Please review, >> Nils Eliasson > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Clean up Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6564 From thartmann at openjdk.java.net Mon Nov 29 06:54:06 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 29 Nov 2021 06:54:06 GMT Subject: RFR: 8277842: IGV: Add jvms property to know where a node came from In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 14:46:36 GMT, Christian Hagedorn wrote: > When dumping a node with `node->dump()`, it also prints the JVM state which tells us to which bci and inlinee method the node belongs to: > > 38 StoreI === 5 7 37 35 [[ 15 ]] @java/lang/Class:exact+116 *, name=y, idx=5; Memory: @java/lang/Class:exact+116 *, name=y, idx=5; !jvms: Test::inlinee @ bci:1 (line 16) Test::test @ bci:4 (line 12) > > IGV only shows the line and bci information with which it is sometimes hard to tell where exactly the node came from, especially with deep inlining. This patch adds the entire JVM state as a `jvms` property field to IGV: > > ![Screenshot from 2021-11-25 15-21-53](https://user-images.githubusercontent.com/17833009/143460385-baf5ee3a-31b0-4693-bfa4-0de91c3c4822.png) > > This helps to better analyze a graph. > > Thanks, > Christian Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6563 From thartmann at openjdk.java.net Mon Nov 29 07:11:09 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 29 Nov 2021 07:11:09 GMT Subject: RFR: 8277602: Deopt code does not extend the stack enough if the caller is an optimize entry blob [v2] In-Reply-To: References: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> Message-ID: On Thu, 25 Nov 2021 18:48:45 GMT, Jorn Vernee wrote: >> Deoptimization code does not recreate c2i adapter 'frames'. For compiled callers this means that the stack needs to be adjusted manually to make room for the parameters when the callee is converted to an interpreter frame (essentially emulating what a c2i adapter would do). >> >> To check if the caller does a compiled call, the current code uses `frame::is_compiled_frame()`, which is true if the codeblob of the caller frame is an instance of `CompiledMethod`. >> >> However, optimized entry blobs also do compiled calls, are not detected by this test, and therefore don't get their stack adjusted correctly. >> >> To address this, I've added a new `frame::is_compiled_caller` function to determine if the caller is doing a compiled call, and I use that in the deopt code instead of `is_compiled_frame`. >> >> This patch also removes an old workaround that tried to fix the issue by allocating some spill space in the optimized entry blob frame, but this only accounts for the first argument. If there are more arguments we still have a problem. The suggested patch fixes this the right way I think. >> >> Thanks, >> Jorn >> >> Testing: run-test-jdk_foreign on Windows x64 and Linux x64 (afaik these are the only tests that use optimized entry blobs) > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Fix after merge > - Merge branch 'master' into Deopt_Stack_Fix > - Add test + asserts > - Properly handle optimized entry frame callers during deopt Looks good to me but please run this through tier4-5 as well. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6522 From chagedorn at openjdk.java.net Mon Nov 29 07:50:11 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 29 Nov 2021 07:50:11 GMT Subject: Integrated: 8277842: IGV: Add jvms property to know where a node came from In-Reply-To: References: Message-ID: <7Lh5W7ccRf5IBcahGS2JoYFwDdKoRiY2zpE3KYcEBZw=.b16e64c2-7ddd-4547-858b-d3dd8603d0c2@github.com> On Thu, 25 Nov 2021 14:46:36 GMT, Christian Hagedorn wrote: > When dumping a node with `node->dump()`, it also prints the JVM state which tells us to which bci and inlinee method the node belongs to: > > 38 StoreI === 5 7 37 35 [[ 15 ]] @java/lang/Class:exact+116 *, name=y, idx=5; Memory: @java/lang/Class:exact+116 *, name=y, idx=5; !jvms: Test::inlinee @ bci:1 (line 16) Test::test @ bci:4 (line 12) > > IGV only shows the line and bci information with which it is sometimes hard to tell where exactly the node came from, especially with deep inlining. This patch adds the entire JVM state as a `jvms` property field to IGV: > > ![Screenshot from 2021-11-25 15-21-53](https://user-images.githubusercontent.com/17833009/143460385-baf5ee3a-31b0-4693-bfa4-0de91c3c4822.png) > > This helps to better analyze a graph. > > Thanks, > Christian This pull request has now been integrated. Changeset: 0c7a4b8a Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/0c7a4b8aa8bb672e87aae7090494719db018b9b1 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod 8277842: IGV: Add jvms property to know where a node came from Reviewed-by: roland, neliasso, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6563 From chagedorn at openjdk.java.net Mon Nov 29 07:50:10 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 29 Nov 2021 07:50:10 GMT Subject: RFR: 8277842: IGV: Add jvms property to know where a node came from In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 14:46:36 GMT, Christian Hagedorn wrote: > When dumping a node with `node->dump()`, it also prints the JVM state which tells us to which bci and inlinee method the node belongs to: > > 38 StoreI === 5 7 37 35 [[ 15 ]] @java/lang/Class:exact+116 *, name=y, idx=5; Memory: @java/lang/Class:exact+116 *, name=y, idx=5; !jvms: Test::inlinee @ bci:1 (line 16) Test::test @ bci:4 (line 12) > > IGV only shows the line and bci information with which it is sometimes hard to tell where exactly the node came from, especially with deep inlining. This patch adds the entire JVM state as a `jvms` property field to IGV: > > ![Screenshot from 2021-11-25 15-21-53](https://user-images.githubusercontent.com/17833009/143460385-baf5ee3a-31b0-4693-bfa4-0de91c3c4822.png) > > This helps to better analyze a graph. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6563 From neliasso at openjdk.java.net Mon Nov 29 08:23:13 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 29 Nov 2021 08:23:13 GMT Subject: RFR: JDK-8264838: IGV: enhance graph export functionality [v2] In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 06:48:59 GMT, Tobias Hartmann wrote: >> Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean up > > Looks good to me too. Thanks for the review @TobiHartmann and @chhagedorn ! ------------- PR: https://git.openjdk.java.net/jdk/pull/6564 From neliasso at openjdk.java.net Mon Nov 29 08:23:14 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 29 Nov 2021 08:23:14 GMT Subject: Integrated: JDK-8264838: IGV: enhance graph export functionality In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 15:16:10 GMT, Nils Eliasson wrote: > Hi, > > This patch adds SVG and searchable PDF export functionality to IGV. > > It's originally contributed by rcastanedalo at openjdk.java.net. > I have updated the patch with new library versions, rebased and tested it. > > Please review, > Nils Eliasson This pull request has now been integrated. Changeset: aed53eea Author: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/aed53eea5ea2782c74ea05521462db2ab20b7ebd Stats: 356 lines in 12 files changed: 77 ins; 262 del; 17 mod 8264838: IGV: enhance graph export functionality Co-authored-by: Roberto Casta?eda Lozano Co-authored-by: Nils Eliasson Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6564 From roland at openjdk.java.net Mon Nov 29 09:30:32 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 29 Nov 2021 09:30:32 GMT Subject: RFR: 8276116: C2: optimize long range checks in int counted loops [v2] In-Reply-To: <8bvd-Dtu9tKQVPWEV5lo0Xa7H2X76uVsgf1l6vKm7CM=.836388f2-a117-4f5d-9385-1690c7d0fd74@github.com> References: <8bvd-Dtu9tKQVPWEV5lo0Xa7H2X76uVsgf1l6vKm7CM=.836388f2-a117-4f5d-9385-1690c7d0fd74@github.com> Message-ID: > Maurizio noticed that some of his panama micro benchmarks don't > perform better avec 8259609 (C2: optimize long range checks in long > counted loops). The reason is that 8259609 optimizes long range checks > in long counted loops but some of his benchmarks include long range > checks in int counted loops: > > for (int i = start; i < stop; i += inc) { > Objects.checkIndex(scale * ((long)i) + offset, length); > } > > This change applies the transformation from 8259609 for long counted > loop/long range checks to int counted loop/long range checks. That > includes creating a loop nest and transforming the long range check to > an int range check that's subject to range elimination in the inner > loop. > > The reason it's required to create a loop nest is that the long range > check transformation logic depends on no overflow of scale * i for the > range of values that the transformed range check is applied to. > > As a consequence, this change is mostly refactoring to make the loop > nest creation and range check transformation parameterized by the type > of the transformed loop. > > I think this transformation needs to be applied as late as possible > but, in the case of an int counted loop, before pre/main/post loops > are created. I had to move it to IdealLoopTree::iteration_split_impl() > because of that. > > There's an alternate shape for a long range check in an int counted > loop that Maurizio insisted needs to be supported: > > for (int i = start; i < stop; i += inc) { > Objects.checkIndex(((long)(scale * i)) + offset, length); > } > > scale * i can overflow in that case. This is also supported but as a > corner case of the previous one. The code in > PhaseIdealLoop::transform_long_range_checks() has a comment about > that. > > Note also that this transformation works best if loop strip mining is > enabled (that is for G1, ZGC, Shenandoah by default). The reason is > that it needs a safepoint and when loop strip mining is enabled, the > outer loop contains one that's always available. A way to have this > work as well for all GCs would be to always construct the loop strip > mining loop nest (whether loop strip mining is enabled or not) and > then only once loop opts are over remove the outer loop when loop > strip mining is disabled. I'm looking for feedback on this. > > BTW, something doesn't seem right in IdealLoopTree::iteration_split_impl(): > > https://github.com/rwestrel/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L3475 > > should_peel causes transformations to be skipped but peeling is never > applied AFAICT. Does it make sense to anyone? Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: x86 ad file fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6576/files - new: https://git.openjdk.java.net/jdk/pull/6576/files/06852bdf..d079f5b8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6576&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6576&range=00-01 Stats: 18 lines in 1 file changed: 18 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6576.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6576/head:pull/6576 PR: https://git.openjdk.java.net/jdk/pull/6576 From roland at openjdk.java.net Mon Nov 29 09:34:14 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 29 Nov 2021 09:34:14 GMT Subject: RFR: 8275638: GraphKit::combine_exception_states fails with "matching stack sizes" assert [v2] In-Reply-To: References: Message-ID: On Sun, 28 Nov 2021 11:30:02 GMT, Dean Long wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> make test runnable with release build > > I tried calling > > set_sp(0); > clean_stack(0); > > in GraphKit::make_exception_state() and it seems to work, but I still don't understand why the stack size mismatch only shows up with late inlining. Thanks for looking at this @dean-long > The existing code is confusing to me. Are these exception states only used when we are actually throwing an exception, or can do they affect the state when deoptimizing? They are only for throwing exceptions > Why does GraphKit::add_exception_state() have a comment about arguments and compares stacks? That makes little sense to me too. > Why doesn't GraphKit::make_exception_state() just reset the stack using push_ex_oop() instead of preserving the stack wtih set_saved_ex_oop()? I'm not sure about that either. ------------- PR: https://git.openjdk.java.net/jdk/pull/6572 From thartmann at openjdk.java.net Mon Nov 29 09:47:07 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 29 Nov 2021 09:47:07 GMT Subject: RFR: 8277878: Fix compiler tests after JDK-8275908 In-Reply-To: References: Message-ID: On Sat, 27 Nov 2021 09:53:26 GMT, Volker Simonis wrote: > This is a quick fix for the two compiler tests introduced by JDK-8275908. The test explicitly added SerialGC in the parameter list which leads to a garbage collector conflict, if the tests are run with some other Garbage collector. > > It was suggested to fix this by adding a `@requires vm.gc.Serial` tag but this is not necessary because the tests are actually GC-agnostic so I've removed the `-XX:+UseSerialGC` parameter from the test command line instead. > > After the fix it was necessary to refine the check for whether the test JVM has JVMCI support built-in. Before the fix, I used `WhiteBox.getWhiteBox().isJVMCISupportedByGC()` which worked fine if only running with SerialGC. Now that we can run with GCs which don't support JVMCI we have to use the more specific `(WB.getBooleanVMFlag("EnableJVMCI") != null)`. > > Please let me know if you want me to push this instantly after it has been reviewed or if you first want to re-run your internal Tier3 tests before pushing. That looks good and trivial to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6581 From yyang at openjdk.java.net Mon Nov 29 09:47:12 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 29 Nov 2021 09:47:12 GMT Subject: RFR: 8267928: Loop predicate gets inexact loop limit before PhaseIdealLoop::rc_predicate In-Reply-To: References: Message-ID: On Fri, 28 May 2021 13:29:36 GMT, Yi Yang wrote: > Loop predicate gets inexact loop limit(LoopLimitNode) from exact_limit(even if the limit is statically known) and does unnecessary overflow checking when generating lower bound test(rc_predicate). The reason is rather straightforward: exact_limit fails to see a HasExactTripCount flag since it would be set after performing loop predicate(iteration_split). Unfortunately, I can not reproduce this with the aforementioned simple case when merged with master. I think some other patches affect generated IR. But the problem is still left behind, i.e. HasExactTripCount flag since it would be set after performing loop predicate. Should we move this PR forward? ------------- PR: https://git.openjdk.java.net/jdk/pull/4247 From eliu at openjdk.java.net Mon Nov 29 09:48:30 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Mon, 29 Nov 2021 09:48:30 GMT Subject: RFR: 8276985: AArch64: [vectorapi] Backend support of VectorMaskToLongNode Message-ID: The lack of codegen for VectorMaskToLong results in a regression on AArch64 for VectorMask.laneIsSet, which relies on the intrinsification of VectorMask.toLong after JDK-8273949. This patch implements bitmask extraction on AArch64 for NEON and SVE by using scalar instructions, which is equivalent to the PMOVMSK instructions on X86. The performance of VectorMask.laneIsSet improves about 10x for NEON and 2x~4x for SVE on my test machines. ------------- Commit messages: - 8276985: AArch64: [vectorapi] Backend support of VectorMaskToLongNode Changes: https://git.openjdk.java.net/jdk/pull/6585/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6585&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276985 Stats: 152 lines in 6 files changed: 152 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6585.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6585/head:pull/6585 PR: https://git.openjdk.java.net/jdk/pull/6585 From thartmann at openjdk.java.net Mon Nov 29 09:51:06 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 29 Nov 2021 09:51:06 GMT Subject: RFR: 8277777: [Vector API] assert(r->is_XMMRegister()) failed: must be in x86_32.ad In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 11:56:44 GMT, Jie Fu wrote: > Hi all, > > The following vector api tests fail on x86_32/AVX512 with `assert(r->is_XMMRegister()) failed: must be`. > > jdk/incubator/vector/Byte64VectorLoadStoreTests.java > jdk/incubator/vector/Byte256VectorLoadStoreTests.java > jdk/incubator/vector/Byte128VectorLoadStoreTests.java > jdk/incubator/vector/ByteMaxVectorLoadStoreTests.java > jdk/incubator/vector/Double256VectorTests.java > jdk/incubator/vector/Double512VectorTests.java > jdk/incubator/vector/DoubleMaxVectorTests.java > jdk/incubator/vector/Float512VectorTests.java > jdk/incubator/vector/Float256VectorTests.java > jdk/incubator/vector/FloatMaxVectorTests.java > jdk/incubator/vector/Float128VectorTests.java > jdk/incubator/vector/Short128VectorLoadStoreTests.java > jdk/incubator/vector/Short256VectorLoadStoreTests.java > jdk/incubator/vector/Short64VectorLoadStoreTests.java > jdk/incubator/vector/ShortMaxVectorLoadStoreTests.java > > > The reason is that `static enum RC rc_class( OptoReg::Name reg )` [1] missed the case for KRegister. > And the AVX-512 opmask specific spilling code [2] should be located before the size assert [3]. > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_32.ad#L747 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_32.ad#L1272 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_32.ad#L1252 That looks good to me but @jatin-bhateja should have a look as well. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6535 From neliasso at openjdk.java.net Mon Nov 29 09:55:15 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 29 Nov 2021 09:55:15 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend [v2] In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: On Wed, 24 Nov 2021 11:52:45 GMT, Patric Hedlin wrote: >> C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). >> >> Contributed by Nick Gasson. >> >> Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). >> >> Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. > > Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: > > Clean-up address calculation via use of legitimize_address(). Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6212 From thartmann at openjdk.java.net Mon Nov 29 10:08:07 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 29 Nov 2021 10:08:07 GMT Subject: RFR: 8276116: C2: optimize long range checks in int counted loops [v2] In-Reply-To: References: <8bvd-Dtu9tKQVPWEV5lo0Xa7H2X76uVsgf1l6vKm7CM=.836388f2-a117-4f5d-9385-1690c7d0fd74@github.com> Message-ID: On Mon, 29 Nov 2021 09:30:32 GMT, Roland Westrelin wrote: >> Maurizio noticed that some of his panama micro benchmarks don't >> perform better avec 8259609 (C2: optimize long range checks in long >> counted loops). The reason is that 8259609 optimizes long range checks >> in long counted loops but some of his benchmarks include long range >> checks in int counted loops: >> >> for (int i = start; i < stop; i += inc) { >> Objects.checkIndex(scale * ((long)i) + offset, length); >> } >> >> This change applies the transformation from 8259609 for long counted >> loop/long range checks to int counted loop/long range checks. That >> includes creating a loop nest and transforming the long range check to >> an int range check that's subject to range elimination in the inner >> loop. >> >> The reason it's required to create a loop nest is that the long range >> check transformation logic depends on no overflow of scale * i for the >> range of values that the transformed range check is applied to. >> >> As a consequence, this change is mostly refactoring to make the loop >> nest creation and range check transformation parameterized by the type >> of the transformed loop. >> >> I think this transformation needs to be applied as late as possible >> but, in the case of an int counted loop, before pre/main/post loops >> are created. I had to move it to IdealLoopTree::iteration_split_impl() >> because of that. >> >> There's an alternate shape for a long range check in an int counted >> loop that Maurizio insisted needs to be supported: >> >> for (int i = start; i < stop; i += inc) { >> Objects.checkIndex(((long)(scale * i)) + offset, length); >> } >> >> scale * i can overflow in that case. This is also supported but as a >> corner case of the previous one. The code in >> PhaseIdealLoop::transform_long_range_checks() has a comment about >> that. >> >> Note also that this transformation works best if loop strip mining is >> enabled (that is for G1, ZGC, Shenandoah by default). The reason is >> that it needs a safepoint and when loop strip mining is enabled, the >> outer loop contains one that's always available. A way to have this >> work as well for all GCs would be to always construct the loop strip >> mining loop nest (whether loop strip mining is enabled or not) and >> then only once loop opts are over remove the outer loop when loop >> strip mining is disabled. I'm looking for feedback on this. >> >> BTW, something doesn't seem right in IdealLoopTree::iteration_split_impl(): >> >> https://github.com/rwestrel/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L3475 >> >> should_peel causes transformations to be skipped but peeling is never >> applied AFAICT. Does it make sense to anyone? > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > x86 ad file fix No review yet, just run this through testing and TestLongRangeCheck.java fails with: java.lang.RuntimeException: should have been deoptimized at TestLongRangeCheck.assertIsNotCompiled(TestLongRangeCheck.java:60) at TestLongRangeCheck.test(TestLongRangeCheck.java:127) at TestLongRangeCheck.main(TestLongRangeCheck.java:215) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:577) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:833) Flags are `-XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6576 From roland at openjdk.java.net Mon Nov 29 10:11:06 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 29 Nov 2021 10:11:06 GMT Subject: RFR: 8275638: GraphKit::combine_exception_states fails with "matching stack sizes" assert [v2] In-Reply-To: References: Message-ID: On Sun, 28 Nov 2021 11:30:02 GMT, Dean Long wrote: > in GraphKit::make_exception_state() and it seems to work, but I still don't understand why the stack size mismatch only shows up with late inlining. At parse time, exception throwing goes through Parse::do_exceptions() which either: - pops the current frame in Parse::throw_to_exit() so there can't be a stack size mismatch anymore if "exception states" are combined later on or - push the exception on the stack in Parse::catch_inline_exceptions() which causes the stack to be resized. When that code path is taken there can be 2 "exception states" for a single bci (the null check and the exception from the call in the case of a virtual call) and they are not "combined" (the explicit test in GraphKit::add_exception_state() prevents that). I agree that this all feels tortuous but not sure if cleaning it up as part of this change is the best thing to do. ------------- PR: https://git.openjdk.java.net/jdk/pull/6572 From chagedorn at openjdk.java.net Mon Nov 29 10:14:11 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 29 Nov 2021 10:14:11 GMT Subject: RFR: 8267928: Loop predicate gets inexact loop limit before PhaseIdealLoop::rc_predicate In-Reply-To: References: Message-ID: On Fri, 28 May 2021 13:29:36 GMT, Yi Yang wrote: > Loop predicate gets inexact loop limit(LoopLimitNode) from exact_limit(even if the limit is statically known) and does unnecessary overflow checking when generating lower bound test(rc_predicate). The reason is rather straightforward: exact_limit fails to see a HasExactTripCount flag since it would be set after performing loop predicate(iteration_split). I guess it is okay then to move forward with this PR without additional IR test if others also agree. ------------- PR: https://git.openjdk.java.net/jdk/pull/4247 From thartmann at openjdk.java.net Mon Nov 29 10:18:12 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 29 Nov 2021 10:18:12 GMT Subject: RFR: 8267928: Loop predicate gets inexact loop limit before PhaseIdealLoop::rc_predicate In-Reply-To: References: Message-ID: On Fri, 28 May 2021 13:29:36 GMT, Yi Yang wrote: > Loop predicate gets inexact loop limit(LoopLimitNode) from exact_limit(even if the limit is statically known) and does unnecessary overflow checking when generating lower bound test(rc_predicate). The reason is rather straightforward: exact_limit fails to see a HasExactTripCount flag since it would be set after performing loop predicate(iteration_split). That's fine with me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4247 From thartmann at openjdk.java.net Mon Nov 29 10:25:08 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 29 Nov 2021 10:25:08 GMT Subject: RFR: 8277843: [Vector API] scalar2vector generates incorrect type info for mask operations if Op_MaskAll is unavailable [v2] In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 07:44:26 GMT, Jie Fu wrote: >> Hi all, >> >> This bug was first observed on x86_32/AVX512. >> It caused 62 vector api test failures. >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >>>> jtreg:test/jdk/jdk/incubator/vector 74 12 62 0 << >> ============================== >> >> >> You can easily reproduce this bug on an AVX512 machine with x86_32. >> Or you can also reproduce it on an AVX512 machine with x86_64 if you disable `Op_MaskAll` like this. >> >> diff --git a/src/hotspot/cpu/x86/x86.ad b/src/hotspot/cpu/x86/x86.ad >> index 3f6d5a44b0d..d5a751b310d 100644 >> --- a/src/hotspot/cpu/x86/x86.ad >> +++ b/src/hotspot/cpu/x86/x86.ad >> @@ -1819,6 +1819,7 @@ const bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType >> } >> break; >> case Op_MaskAll: >> + return false; >> if (!is_LP64 || !VM_Version::supports_evex()) { >> return false; >> } >> >> >> The failure reason is that `VectorNode::scalar2vector` generate incorrect IR for mask operations if `Op_MaskAll` is unavailable. >> So it shouldn't be used for mask operations if `Op_MaskAll` is unavailable. >> >> Testing (with two more bug fixes https://github.com/openjdk/jdk/pull/6535 and https://github.com/openjdk/jdk/pull/6533): >> - vector api tests on {x86_64, x86_32}/{AVX512, AVX256}, all passed >> - vector api tests on aarch64, all passed >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6562 From roland at openjdk.java.net Mon Nov 29 10:26:40 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 29 Nov 2021 10:26:40 GMT Subject: RFR: 8276116: C2: optimize long range checks in int counted loops [v3] In-Reply-To: <8bvd-Dtu9tKQVPWEV5lo0Xa7H2X76uVsgf1l6vKm7CM=.836388f2-a117-4f5d-9385-1690c7d0fd74@github.com> References: <8bvd-Dtu9tKQVPWEV5lo0Xa7H2X76uVsgf1l6vKm7CM=.836388f2-a117-4f5d-9385-1690c7d0fd74@github.com> Message-ID: > Maurizio noticed that some of his panama micro benchmarks don't > perform better avec 8259609 (C2: optimize long range checks in long > counted loops). The reason is that 8259609 optimizes long range checks > in long counted loops but some of his benchmarks include long range > checks in int counted loops: > > for (int i = start; i < stop; i += inc) { > Objects.checkIndex(scale * ((long)i) + offset, length); > } > > This change applies the transformation from 8259609 for long counted > loop/long range checks to int counted loop/long range checks. That > includes creating a loop nest and transforming the long range check to > an int range check that's subject to range elimination in the inner > loop. > > The reason it's required to create a loop nest is that the long range > check transformation logic depends on no overflow of scale * i for the > range of values that the transformed range check is applied to. > > As a consequence, this change is mostly refactoring to make the loop > nest creation and range check transformation parameterized by the type > of the transformed loop. > > I think this transformation needs to be applied as late as possible > but, in the case of an int counted loop, before pre/main/post loops > are created. I had to move it to IdealLoopTree::iteration_split_impl() > because of that. > > There's an alternate shape for a long range check in an int counted > loop that Maurizio insisted needs to be supported: > > for (int i = start; i < stop; i += inc) { > Objects.checkIndex(((long)(scale * i)) + offset, length); > } > > scale * i can overflow in that case. This is also supported but as a > corner case of the previous one. The code in > PhaseIdealLoop::transform_long_range_checks() has a comment about > that. > > Note also that this transformation works best if loop strip mining is > enabled (that is for G1, ZGC, Shenandoah by default). The reason is > that it needs a safepoint and when loop strip mining is enabled, the > outer loop contains one that's always available. A way to have this > work as well for all GCs would be to always construct the loop strip > mining loop nest (whether loop strip mining is enabled or not) and > then only once loop opts are over remove the outer loop when loop > strip mining is disabled. I'm looking for feedback on this. > > BTW, something doesn't seem right in IdealLoopTree::iteration_split_impl(): > > https://github.com/rwestrel/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L3475 > > should_peel causes transformations to be skipped but peeling is never > applied AFAICT. Does it make sense to anyone? Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: test fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6576/files - new: https://git.openjdk.java.net/jdk/pull/6576/files/d079f5b8..750b3a26 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6576&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6576&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6576.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6576/head:pull/6576 PR: https://git.openjdk.java.net/jdk/pull/6576 From roland at openjdk.java.net Mon Nov 29 10:26:42 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 29 Nov 2021 10:26:42 GMT Subject: RFR: 8276116: C2: optimize long range checks in int counted loops [v2] In-Reply-To: References: <8bvd-Dtu9tKQVPWEV5lo0Xa7H2X76uVsgf1l6vKm7CM=.836388f2-a117-4f5d-9385-1690c7d0fd74@github.com> Message-ID: On Mon, 29 Nov 2021 10:05:16 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> x86 ad file fix > > No review yet, just run this through testing and TestLongRangeCheck.java fails with: > > java.lang.RuntimeException: should have been deoptimized > at TestLongRangeCheck.assertIsNotCompiled(TestLongRangeCheck.java:60) > at TestLongRangeCheck.test(TestLongRangeCheck.java:127) > at TestLongRangeCheck.main(TestLongRangeCheck.java:215) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:577) > at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) > at java.base/java.lang.Thread.run(Thread.java:833) > > > Flags are `-XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` @TobiHartmann thanks for running testing. That one should be fixed now. ------------- PR: https://git.openjdk.java.net/jdk/pull/6576 From chagedorn at openjdk.java.net Mon Nov 29 10:36:14 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 29 Nov 2021 10:36:14 GMT Subject: RFR: 8277878: Fix compiler tests after JDK-8275908 In-Reply-To: References: Message-ID: On Sat, 27 Nov 2021 09:53:26 GMT, Volker Simonis wrote: > This is a quick fix for the two compiler tests introduced by JDK-8275908. The test explicitly added SerialGC in the parameter list which leads to a garbage collector conflict, if the tests are run with some other Garbage collector. > > It was suggested to fix this by adding a `@requires vm.gc.Serial` tag but this is not necessary because the tests are actually GC-agnostic so I've removed the `-XX:+UseSerialGC` parameter from the test command line instead. > > After the fix it was necessary to refine the check for whether the test JVM has JVMCI support built-in. Before the fix, I used `WhiteBox.getWhiteBox().isJVMCISupportedByGC()` which worked fine if only running with SerialGC. Now that we can run with GCs which don't support JVMCI we have to use the more specific `(WB.getBooleanVMFlag("EnableJVMCI") != null)`. > > Please let me know if you want me to push this instantly after it has been reviewed or if you first want to re-run your internal Tier3 tests before pushing. Looks good! I guess it does not hurt to quickly verify it. Tier3 testing is submitted, will get back to you with the results. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6581 From phedlin at openjdk.java.net Mon Nov 29 10:37:09 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Mon, 29 Nov 2021 10:37:09 GMT Subject: RFR: 8276108: Wrong instruction generation in aarch64 backend [v2] In-Reply-To: References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: On Wed, 24 Nov 2021 11:52:45 GMT, Patric Hedlin wrote: >> C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). >> >> Contributed by Nick Gasson. >> >> Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). >> >> Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. > > Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: > > Clean-up address calculation via use of legitimize_address(). Thank you for reviewing @aph and @neliasso. ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From phedlin at openjdk.java.net Mon Nov 29 10:37:10 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Mon, 29 Nov 2021 10:37:10 GMT Subject: Integrated: 8276108: Wrong instruction generation in aarch64 backend In-Reply-To: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> References: <9P964HSPi8geJ3GVJwbsGo_t_lASstJkxsrX76Xv8K8=.898bf726-d71b-4a90-a6b7-83895eec3494@github.com> Message-ID: <5hyCZ6WPUXNfFRx5aNxdZr9CK80kKRvfNNVTOnHAftA=.24ab7177-b96e-4888-80f3-5c4186023946@github.com> On Tue, 2 Nov 2021 14:02:48 GMT, Patric Hedlin wrote: > C1 code generation on AArch64 may produce bad LDR/STR immediate offset instructions when the actual operand (datum) size is unknown. This change will alter the code generated for the problematic immediate offset to use the register offset version (requiring additional instructions). > > Contributed by Nick Gasson. > > Added assert in Address::encode() to emphasise the use of a valid immediate (in base_plus_offset). > > Added clarifying comment to Address::offset_ok_for_immed() emphasising favouring of the scaled unsigned 12-bit encoding for aligned offsets. This pull request has now been integrated. Changeset: 72bacf8d Author: Patric Hedlin URL: https://git.openjdk.java.net/jdk/commit/72bacf8d256071773d8fd9f9c2d0aebb2cb32dea Stats: 26 lines in 3 files changed: 8 ins; 2 del; 16 mod 8276108: Wrong instruction generation in aarch64 backend Co-authored-by: Nick Gasson Reviewed-by: aph, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/6212 From simonis at openjdk.java.net Mon Nov 29 11:08:05 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 29 Nov 2021 11:08:05 GMT Subject: RFR: 8277878: Fix compiler tests after JDK-8275908 In-Reply-To: References: Message-ID: <2zBBzqJk3o4Y3WyyNuZ658h-5zTLXZPtsBr6bEin02w=.6f61d0cd-d04e-41ce-921a-f51711b86395@github.com> On Mon, 29 Nov 2021 10:33:29 GMT, Christian Hagedorn wrote: >> This is a quick fix for the two compiler tests introduced by JDK-8275908. The test explicitly added SerialGC in the parameter list which leads to a garbage collector conflict, if the tests are run with some other Garbage collector. >> >> It was suggested to fix this by adding a `@requires vm.gc.Serial` tag but this is not necessary because the tests are actually GC-agnostic so I've removed the `-XX:+UseSerialGC` parameter from the test command line instead. >> >> After the fix it was necessary to refine the check for whether the test JVM has JVMCI support built-in. Before the fix, I used `WhiteBox.getWhiteBox().isJVMCISupportedByGC()` which worked fine if only running with SerialGC. Now that we can run with GCs which don't support JVMCI we have to use the more specific `(WB.getBooleanVMFlag("EnableJVMCI") != null)`. >> >> Please let me know if you want me to push this instantly after it has been reviewed or if you first want to re-run your internal Tier3 tests before pushing. > > Looks good! I guess it does not hurt to quickly verify it. Tier3 testing is submitted, will get back to you with the results. Thanks @chhagedorn, @TobiHartmann! I'll wait with submitting until @chhagedorn reports back the Tier3 results. ------------- PR: https://git.openjdk.java.net/jdk/pull/6581 From chagedorn at openjdk.java.net Mon Nov 29 11:23:35 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 29 Nov 2021 11:23:35 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v4] In-Reply-To: References: Message-ID: <8ZBmN3KXGkCgYPCmtrkM1lYGopX_j9nc4pN21w5s4CA=.1cc50fcc-bec2-4553-8aca-7de9b04b5120@github.com> > In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 > ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) > > In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 > > During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: > https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 > > But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. > > I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. > > I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. > > I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Remove igvn checks ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6276/files - new: https://git.openjdk.java.net/jdk/pull/6276/files/a1675453..4030ed2c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6276&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6276&range=02-03 Stats: 23 lines in 1 file changed: 7 ins; 11 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/6276.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6276/head:pull/6276 PR: https://git.openjdk.java.net/jdk/pull/6276 From chagedorn at openjdk.java.net Mon Nov 29 11:23:42 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 29 Nov 2021 11:23:42 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v3] In-Reply-To: References: Message-ID: On Sun, 28 Nov 2021 23:27:24 GMT, Vladimir Kozlov wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - New fix >> - Merge branch 'master' into JDK-8275326 >> - handle GVN >> - C2: assert(no_dead_loop) failed: dead loop detected > > src/hotspot/share/opto/cfgnode.cpp line 2315: > >> 2313: } >> 2314: >> 2315: if (igvn) { > > This whole split through `mergemem` part of code is guarded by `can_reshape` check at line #2196 which is `true` only during IGVN (`PhaseIterGVN`). So you don't need to worry about executing this in parser. > But checking `igvn` value is fine here since you use it as we do at line #2278. Good point. I guess I could clean this up and the other `if (igvn)` checks after L2196 > src/hotspot/share/opto/cfgnode.cpp line 2321: > >> 2319: // visiting it in the transformations below. >> 2320: igvn->replace_node(this, result); >> 2321: igvn->set_type(result, result->bottom_type()); > > Did you consider simply cut off Phi's inputs with `top` by using `replace_input_of()`? Unfortunately, this does not work. When doing the dead loop check for `11853 Phi` in the `phase->transform(phi)` call below, we find the `this` phi (`989 Phi`) by following the inputs of `11853 Phi` and conclude it's dead loop safe. This means we would need to cut the outputs of `989 Phi`. The only good way I found to achieve that is to directly replace the `this` phi here with the resulting mergemem node. Otherwise, we somehow need to restore the outputs again after the transformations to correctly subsume the `this` phi later. ------------- PR: https://git.openjdk.java.net/jdk/pull/6276 From tschatzl at openjdk.java.net Mon Nov 29 11:40:23 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 29 Nov 2021 11:40:23 GMT Subject: RFR: 8277928: Fix compilation on macosx-aarch64 after 8276108 Message-ID: Hi all, can I have reviews for this change that uses a wrong format specifier for int64 which makes osx/aarch64 builds fail. [2021-11-29T10:42:32,638Z] .../src/hotspot/cpu/aarch64/assembler_aarch64.hpp:482:41: error: format specifies type 'long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat] [2021-11-29T10:42:32,638Z] "must be, was: %ld, %d", _offset, size); [2021-11-29T10:42:32,638Z] ~~~ ^~~~~~~ [2021-11-29T10:42:32,638Z] %lld [2021-11-29T10:42:32,638Z] .../debug.hpp:65:36: note: expanded from macro 'assert'``` Instead of the hardcoded `%ld` the change uses `INT64_FORMAT`. Testing: tier1 running, Gha Thanks, Thomas ------------- Commit messages: - Fix format specifier Changes: https://git.openjdk.java.net/jdk/pull/6590/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6590&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277928 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6590.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6590/head:pull/6590 PR: https://git.openjdk.java.net/jdk/pull/6590 From shade at openjdk.java.net Mon Nov 29 11:50:10 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 29 Nov 2021 11:50:10 GMT Subject: RFR: 8277928: Fix compilation on macosx-aarch64 after 8276108 In-Reply-To: References: Message-ID: <5gQ81dYKoGehTUCvuByYU05bNkxmchkMCd2ZWE8ac6E=.d7bfce1a-7366-478d-8b75-6efb153d7224@github.com> On Mon, 29 Nov 2021 11:33:19 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that uses a wrong format specifier for int64 which makes osx/aarch64 builds fail. > > [2021-11-29T10:42:32,638Z] .../src/hotspot/cpu/aarch64/assembler_aarch64.hpp:482:41: error: format specifies type 'long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat] > [2021-11-29T10:42:32,638Z] "must be, was: %ld, %d", _offset, size); > [2021-11-29T10:42:32,638Z] ~~~ ^~~~~~~ > [2021-11-29T10:42:32,638Z] %lld > [2021-11-29T10:42:32,638Z] .../debug.hpp:65:36: note: expanded from macro 'assert'``` > > Instead of the hardcoded `%ld` the change uses `INT64_FORMAT`. > > Testing: tier1 running, Gha > > Thanks, > Thomas Looks fine and trivial. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6590 From dholmes at openjdk.java.net Mon Nov 29 11:57:11 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 29 Nov 2021 11:57:11 GMT Subject: RFR: 8277928: Fix compilation on macosx-aarch64 after 8276108 In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 11:33:19 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that uses a wrong format specifier for int64 which makes osx/aarch64 builds fail. > > [2021-11-29T10:42:32,638Z] .../src/hotspot/cpu/aarch64/assembler_aarch64.hpp:482:41: error: format specifies type 'long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat] > [2021-11-29T10:42:32,638Z] "must be, was: %ld, %d", _offset, size); > [2021-11-29T10:42:32,638Z] ~~~ ^~~~~~~ > [2021-11-29T10:42:32,638Z] %lld > [2021-11-29T10:42:32,638Z] .../debug.hpp:65:36: note: expanded from macro 'assert'``` > > Instead of the hardcoded `%ld` the change uses `INT64_FORMAT`. > > Testing: tier1 running, Gha > > Thanks, > Thomas Thanks for fixing promptly. David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6590 From tschatzl at openjdk.java.net Mon Nov 29 12:03:10 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 29 Nov 2021 12:03:10 GMT Subject: Integrated: 8277928: Fix compilation on macosx-aarch64 after 8276108 In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 11:33:19 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that uses a wrong format specifier for int64 which makes osx/aarch64 builds fail. > > [2021-11-29T10:42:32,638Z] .../src/hotspot/cpu/aarch64/assembler_aarch64.hpp:482:41: error: format specifies type 'long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat] > [2021-11-29T10:42:32,638Z] "must be, was: %ld, %d", _offset, size); > [2021-11-29T10:42:32,638Z] ~~~ ^~~~~~~ > [2021-11-29T10:42:32,638Z] %lld > [2021-11-29T10:42:32,638Z] .../debug.hpp:65:36: note: expanded from macro 'assert' > > > Instead of the hardcoded `%ld` the change uses `INT64_FORMAT`. > > Testing: tier1 running, Gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: 2622ab3f Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/2622ab3fe94814fb4f7f22e4015ef1519e546905 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8277928: Fix compilation on macosx-aarch64 after 8276108 Reviewed-by: shade, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/6590 From dholmes at openjdk.java.net Mon Nov 29 12:04:03 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 29 Nov 2021 12:04:03 GMT Subject: RFR: 8277878: Fix compiler tests after JDK-8275908 In-Reply-To: <2zBBzqJk3o4Y3WyyNuZ658h-5zTLXZPtsBr6bEin02w=.6f61d0cd-d04e-41ce-921a-f51711b86395@github.com> References: <2zBBzqJk3o4Y3WyyNuZ658h-5zTLXZPtsBr6bEin02w=.6f61d0cd-d04e-41ce-921a-f51711b86395@github.com> Message-ID: On Mon, 29 Nov 2021 11:05:15 GMT, Volker Simonis wrote: >> Looks good! I guess it does not hurt to quickly verify it. Tier3 testing is submitted, will get back to you with the results. > > Thanks @chhagedorn, @TobiHartmann! > > I'll wait with submitting until @chhagedorn reports back the Tier3 results. @simonis the submitted tests have passed. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6581 From simonis at openjdk.java.net Mon Nov 29 12:33:09 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 29 Nov 2021 12:33:09 GMT Subject: RFR: 8277878: Fix compiler tests after JDK-8275908 In-Reply-To: References: <2zBBzqJk3o4Y3WyyNuZ658h-5zTLXZPtsBr6bEin02w=.6f61d0cd-d04e-41ce-921a-f51711b86395@github.com> Message-ID: On Mon, 29 Nov 2021 12:00:39 GMT, David Holmes wrote: >> Thanks @chhagedorn, @TobiHartmann! >> >> I'll wait with submitting until @chhagedorn reports back the Tier3 results. > > @simonis the submitted tests have passed. Thanks. Thanks @dholmes-ora. ------------- PR: https://git.openjdk.java.net/jdk/pull/6581 From simonis at openjdk.java.net Mon Nov 29 12:33:10 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 29 Nov 2021 12:33:10 GMT Subject: Integrated: 8277878: Fix compiler tests after JDK-8275908 In-Reply-To: References: Message-ID: On Sat, 27 Nov 2021 09:53:26 GMT, Volker Simonis wrote: > This is a quick fix for the two compiler tests introduced by JDK-8275908. The test explicitly added SerialGC in the parameter list which leads to a garbage collector conflict, if the tests are run with some other Garbage collector. > > It was suggested to fix this by adding a `@requires vm.gc.Serial` tag but this is not necessary because the tests are actually GC-agnostic so I've removed the `-XX:+UseSerialGC` parameter from the test command line instead. > > After the fix it was necessary to refine the check for whether the test JVM has JVMCI support built-in. Before the fix, I used `WhiteBox.getWhiteBox().isJVMCISupportedByGC()` which worked fine if only running with SerialGC. Now that we can run with GCs which don't support JVMCI we have to use the more specific `(WB.getBooleanVMFlag("EnableJVMCI") != null)`. > > Please let me know if you want me to push this instantly after it has been reviewed or if you first want to re-run your internal Tier3 tests before pushing. This pull request has now been integrated. Changeset: 614c6e61 Author: Volker Simonis URL: https://git.openjdk.java.net/jdk/commit/614c6e61fa3a9f094a311b12e780491c611657e6 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod 8277878: Fix compiler tests after JDK-8275908 Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/6581 From lucy at openjdk.java.net Mon Nov 29 13:02:10 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Mon, 29 Nov 2021 13:02:10 GMT Subject: RFR: 8277846: Implement fast-path for ASCII-compatible CharsetEncoders on ppc64 [v2] In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 15:46:29 GMT, Martin Doerr wrote: >> PPC64 port of 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 >> I moved the `encode_iso_array` implementation into `C2_MacroAssembler` and reused it for the new ASCII node. The algorithm is unchanged. We only need to change the mask because (non-extended) ASCII uses 7 bit (also see x86 implementation). >> >> 'micro:CharsetEncodeDecode' benchmark results without intrinsic on Power9: >> >> >> Benchmark (size) (type) Mode Cnt Score Error Units >> CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 14.530 ? 0.012 us/op >> CharsetEncodeDecode.encode 16384 BIG5 avgt 30 15.359 ? 0.014 us/op >> CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 14.256 ? 0.019 us/op >> CharsetEncodeDecode.encode 16384 ASCII avgt 30 14.237 ? 0.012 us/op >> >> >> With intrinsic: >> >> >> Benchmark (size) (type) Mode Cnt Score Error Units >> CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 5.085 ? 0.016 us/op >> CharsetEncodeDecode.encode 16384 BIG5 avgt 30 5.905 ? 0.023 us/op >> CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 4.795 ? 0.023 us/op >> CharsetEncodeDecode.encode 16384 ASCII avgt 30 4.798 ? 0.013 us/op > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Enable new ASCII node. Changes look good to me. Thank you for this optimisation. 3x in microbench justifies the effort, I assume. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6565 From mbaesken at openjdk.java.net Mon Nov 29 15:28:08 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Mon, 29 Nov 2021 15:28:08 GMT Subject: RFR: 8277846: Implement fast-path for ASCII-compatible CharsetEncoders on ppc64 [v2] In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 15:46:29 GMT, Martin Doerr wrote: >> PPC64 port of 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 >> I moved the `encode_iso_array` implementation into `C2_MacroAssembler` and reused it for the new ASCII node. The algorithm is unchanged. We only need to change the mask because (non-extended) ASCII uses 7 bit (also see x86 implementation). >> >> 'micro:CharsetEncodeDecode' benchmark results without intrinsic on Power9: >> >> >> Benchmark (size) (type) Mode Cnt Score Error Units >> CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 14.530 ? 0.012 us/op >> CharsetEncodeDecode.encode 16384 BIG5 avgt 30 15.359 ? 0.014 us/op >> CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 14.256 ? 0.019 us/op >> CharsetEncodeDecode.encode 16384 ASCII avgt 30 14.237 ? 0.012 us/op >> >> >> With intrinsic: >> >> >> Benchmark (size) (type) Mode Cnt Score Error Units >> CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 5.085 ? 0.016 us/op >> CharsetEncodeDecode.encode 16384 BIG5 avgt 30 5.905 ? 0.023 us/op >> CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 4.795 ? 0.023 us/op >> CharsetEncodeDecode.encode 16384 ASCII avgt 30 4.798 ? 0.013 us/op > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Enable new ASCII node. Marked as reviewed by mbaesken (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6565 From mdoerr at openjdk.java.net Mon Nov 29 15:28:09 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 29 Nov 2021 15:28:09 GMT Subject: RFR: 8277846: Implement fast-path for ASCII-compatible CharsetEncoders on ppc64 [v2] In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 15:46:29 GMT, Martin Doerr wrote: >> PPC64 port of 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 >> I moved the `encode_iso_array` implementation into `C2_MacroAssembler` and reused it for the new ASCII node. The algorithm is unchanged. We only need to change the mask because (non-extended) ASCII uses 7 bit (also see x86 implementation). >> >> 'micro:CharsetEncodeDecode' benchmark results without intrinsic on Power9: >> >> >> Benchmark (size) (type) Mode Cnt Score Error Units >> CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 14.530 ? 0.012 us/op >> CharsetEncodeDecode.encode 16384 BIG5 avgt 30 15.359 ? 0.014 us/op >> CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 14.256 ? 0.019 us/op >> CharsetEncodeDecode.encode 16384 ASCII avgt 30 14.237 ? 0.012 us/op >> >> >> With intrinsic: >> >> >> Benchmark (size) (type) Mode Cnt Score Error Units >> CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 5.085 ? 0.016 us/op >> CharsetEncodeDecode.encode 16384 BIG5 avgt 30 5.905 ? 0.023 us/op >> CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 4.795 ? 0.023 us/op >> CharsetEncodeDecode.encode 16384 ASCII avgt 30 4.798 ? 0.013 us/op > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Enable new ASCII node. Thanks a lot for your reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/6565 From kvn at openjdk.java.net Mon Nov 29 15:53:11 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 29 Nov 2021 15:53:11 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v4] In-Reply-To: <8ZBmN3KXGkCgYPCmtrkM1lYGopX_j9nc4pN21w5s4CA=.1cc50fcc-bec2-4553-8aca-7de9b04b5120@github.com> References: <8ZBmN3KXGkCgYPCmtrkM1lYGopX_j9nc4pN21w5s4CA=.1cc50fcc-bec2-4553-8aca-7de9b04b5120@github.com> Message-ID: On Mon, 29 Nov 2021 11:23:35 GMT, Christian Hagedorn wrote: >> In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 >> ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) >> >> In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 >> >> During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 >> >> But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. >> >> I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. >> >> I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. >> >> I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Remove igvn checks Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6276 From kvn at openjdk.java.net Mon Nov 29 15:53:12 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 29 Nov 2021 15:53:12 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v3] In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 11:13:32 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/cfgnode.cpp line 2321: >> >>> 2319: // visiting it in the transformations below. >>> 2320: igvn->replace_node(this, result); >>> 2321: igvn->set_type(result, result->bottom_type()); >> >> Did you consider simply cut off Phi's inputs with `top` by using `replace_input_of()`? > > Unfortunately, this does not work. When doing the dead loop check for `11853 Phi` in the `phase->transform(phi)` call below, we find the `this` phi (`989 Phi`) by following the inputs of `11853 Phi` and conclude it's dead loop safe. This means we would need to cut the outputs of `989 Phi`. The only good way I found to achieve that is to directly replace the `this` phi here with the resulting mergemem node. Otherwise, we somehow need to restore the outputs again after the transformations to correctly subsume the `this` phi later. Okay. ------------- PR: https://git.openjdk.java.net/jdk/pull/6276 From sviswanathan at openjdk.java.net Mon Nov 29 18:33:08 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 29 Nov 2021 18:33:08 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v3] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Wed, 24 Nov 2021 14:05:34 GMT, Mai ??ng Qu?n Anh wrote: >> Hi, >> >> This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. >> >> The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request incrementally with one additional commit since the last revision: > > add check bmi Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6447 From sviswanathan at openjdk.java.net Mon Nov 29 18:36:10 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 29 Nov 2021 18:36:10 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Tue, 23 Nov 2021 15:51:06 GMT, Paul Sandoz wrote: >> Mai ??ng Qu?n Anh has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into vectorMaskReduction >> - reduce some dependencies with spare register >> - improve mask reduction logic on AVX > > This needs another hotspot reviewer to review before integration. @PaulSandoz Do we need another test run after the changes? If so, could you please help in running through your testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From psandoz at openjdk.java.net Mon Nov 29 19:44:14 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 29 Nov 2021 19:44:14 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v2] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Mon, 29 Nov 2021 18:33:05 GMT, Sandhya Viswanathan wrote: >> This needs another hotspot reviewer to review before integration. > > @PaulSandoz Do we need another test run after the changes? If so, could you please help in running through your testing. @sviswa7 re-running tests since there were some subtle changes in the code ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From psandoz at openjdk.java.net Mon Nov 29 21:10:01 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 29 Nov 2021 21:10:01 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v3] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Wed, 24 Nov 2021 14:05:34 GMT, Mai ??ng Qu?n Anh wrote: >> Hi, >> >> This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. >> >> The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). >> >> Thank you very much. > > Mai ??ng Qu?n Anh has updated the pull request incrementally with one additional commit since the last revision: > > add check bmi Latest tests are good. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From xliu at openjdk.java.net Mon Nov 29 21:29:43 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 29 Nov 2021 21:29:43 GMT Subject: RFR: 8274983: C1 optimizes the invocation of private interface methods [v4] In-Reply-To: References: Message-ID: <8cGDsI9OKsz7fDHkeLCypBo9sJBk5NfNhTz11elo_Kk=.9c048c1a-ae6f-4ca4-a8dd-a39dab6a3314@github.com> > The root cause of the C1 regression is that some regex generate multiple classes which all implement > an interface. In SlowStartupTest.java, the following **invokeinterface** happens frequently with different receivers. the target is a private interface method. > > > 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z > > > This patch allows c1 to generate the optimized virtual call for invokeinterface > whose targets are the private interface methods. > > Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private > interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, > LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because > it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, > it is possible that they trash the IC stub using their own concrete klass in runtime. > > Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM > 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. > Therefore, this patch can prevent the callsite from trashing. > > Before this patch, SlowStartupTest had 38770 times _resolve_invoke_virtual_cnt and 38695 _handle_wrong_method_cnt per 10k iterations. To dump `C1Statistics`, we use fastdebug build for comparison. > > > $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 > Executed 10000 iterations in 736ms > C1 Runtime statistics: > _resolve_invoke_virtual_cnt: 38770 > _resolve_invoke_opt_virtual_cnt: 186 > _resolve_invoke_static_cnt: 44 > _handle_wrong_method_cnt: 38695 > _ic_miss_cnt: 35 > > > With this patch, only 1 _handle_wrong_method_cnt is triggered but we have 3 more `_resolve_invoke_opt_virtual_cnt` events instead. The total runtime reduces from 736ms to 9ms. > > > $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 > Executed 10000 iterations in 9ms > C1 Runtime statistics: > _resolve_invoke_virtual_cnt: 77 > _resolve_invoke_opt_virtual_cnt: 189 > _resolve_invoke_static_cnt: 45 > _handle_wrong_method_cnt: 1 > _ic_miss_cnt: 39 > > > Codegen wise, before the patch, C1 generates LIR for the invokeinterface whose target is a private interface methoda as follows. > > __bci__use__tid____instr____________________________________ > . 1 0 v2 a1.invokeinterface() > InvokePrivateInterfaceMethod$I.bar()V > . 6 0 v3 return > > > With this patch, C1 generates LIR as follows. it first check a1 is a subtype of `InvokePrivateInterfaceMethod$I`. if so, an optimized virtual call is generated. The callsite will be fixed up once and only one time in runtime. > > __bci__use__tid____instr____________________________________ > . 1 1 a2 checkcast(a1) InvokePrivateInterfaceMethod$I > stack [0:a1] > . 1 0 v3 a2.invokeinterface() > InvokePrivateInterfaceMethod$I.bar()V > . 6 0 v4 return Xin Liu has updated the pull request incrementally with one additional commit since the last revision: Add a microbenchmark for the case. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6445/files - new: https://git.openjdk.java.net/jdk/pull/6445/files/6a10e772..56133c52 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6445&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6445&range=02-03 Stats: 82 lines in 1 file changed: 82 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6445.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6445/head:pull/6445 PR: https://git.openjdk.java.net/jdk/pull/6445 From xliu at openjdk.java.net Mon Nov 29 21:29:46 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 29 Nov 2021 21:29:46 GMT Subject: RFR: 8274983: C1 optimizes the invocation of private interface methods [v3] In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 23:42:27 GMT, Xin Liu wrote: >> The root cause of the C1 regression is that some regex generate multiple classes which all implement >> an interface. In SlowStartupTest.java, the following **invokeinterface** happens frequently with different receivers. the target is a private interface method. >> >> >> 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z >> >> >> This patch allows c1 to generate the optimized virtual call for invokeinterface >> whose targets are the private interface methods. >> >> Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private >> interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, >> LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because >> it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, >> it is possible that they trash the IC stub using their own concrete klass in runtime. >> >> Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM >> 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. >> Therefore, this patch can prevent the callsite from trashing. >> >> Before this patch, SlowStartupTest had 38770 times _resolve_invoke_virtual_cnt and 38695 _handle_wrong_method_cnt per 10k iterations. To dump `C1Statistics`, we use fastdebug build for comparison. >> >> >> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 >> Executed 10000 iterations in 736ms >> C1 Runtime statistics: >> _resolve_invoke_virtual_cnt: 38770 >> _resolve_invoke_opt_virtual_cnt: 186 >> _resolve_invoke_static_cnt: 44 >> _handle_wrong_method_cnt: 38695 >> _ic_miss_cnt: 35 >> >> >> With this patch, only 1 _handle_wrong_method_cnt is triggered but we have 3 more `_resolve_invoke_opt_virtual_cnt` events instead. The total runtime reduces from 736ms to 9ms. >> >> >> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 >> Executed 10000 iterations in 9ms >> C1 Runtime statistics: >> _resolve_invoke_virtual_cnt: 77 >> _resolve_invoke_opt_virtual_cnt: 189 >> _resolve_invoke_static_cnt: 45 >> _handle_wrong_method_cnt: 1 >> _ic_miss_cnt: 39 >> >> >> Codegen wise, before the patch, C1 generates LIR for the invokeinterface whose target is a private interface methoda as follows. >> >> __bci__use__tid____instr____________________________________ >> . 1 0 v2 a1.invokeinterface() >> InvokePrivateInterfaceMethod$I.bar()V >> . 6 0 v3 return >> >> >> With this patch, C1 generates LIR as follows. it first check a1 is a subtype of `InvokePrivateInterfaceMethod$I`. if so, an optimized virtual call is generated. The callsite will be fixed up once and only one time in runtime. >> >> __bci__use__tid____instr____________________________________ >> . 1 1 a2 checkcast(a1) InvokePrivateInterfaceMethod$I >> stack [0:a1] >> . 1 0 v3 a2.invokeinterface() >> InvokePrivateInterfaceMethod$I.bar()V >> . 6 0 v4 return > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Call set_invokespecial_receiver_check() so invokespecial throws IllegalAccessError. > > We need to checkcast for invokespecial even target->can_be_statically_bound() is false. > eg. https://github.com/openjdk/jdk/blob/master/test/jdk/java/lang/invoke/SpecialInterfaceCallI4.jasm#L33 I add a JMH benchmark for this one. $make test TEST='micro:org.openjdk.bench.vm.compiler.InterfacePrivateCalls' CONF=linux-x86_64-server-release //master(7b2d823e) Benchmark Mode Cnt Score Error Units InterfacePrivateCalls.invokePrivateInterfaceMethodC1 avgt 5 11399.058 ? 1086.745 ns/op InterfacePrivateCalls.invokePrivateInterfaceMethodC2 avgt 5 23.741 ? 0.189 ns/op // with this patch Benchmark Mode Cnt Score Error Units InterfacePrivateCalls.invokePrivateInterfaceMethodC1 avgt 5 24.534 ? 0.130 ns/op InterfacePrivateCalls.invokePrivateInterfaceMethodC2 avgt 5 23.800 ? 0.384 ns/op The average cost of C1 has improved from 11399 ns/op to 24.534 ns/op, or 464x faster. now the cost of c1 is comparable to C2. ------------- PR: https://git.openjdk.java.net/jdk/pull/6445 From xliu at openjdk.java.net Mon Nov 29 21:39:39 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 29 Nov 2021 21:39:39 GMT Subject: RFR: 8274983: C1 optimizes the invocation of private interface methods [v5] In-Reply-To: References: Message-ID: > The root cause of the C1 regression is that some regex generate multiple classes which all implement > an interface. In SlowStartupTest.java, the following **invokeinterface** happens frequently with different receivers. the target is a private interface method. > > > 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z > > > This patch allows c1 to generate the optimized virtual call for invokeinterface > whose targets are the private interface methods. > > Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private > interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, > LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because > it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, > it is possible that they trash the IC stub using their own concrete klass in runtime. > > Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM > 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. > Therefore, this patch can prevent the callsite from trashing. > > Before this patch, SlowStartupTest had 38770 times _resolve_invoke_virtual_cnt and 38695 _handle_wrong_method_cnt per 10k iterations. To dump `C1Statistics`, we use fastdebug build for comparison. > > > $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 > Executed 10000 iterations in 736ms > C1 Runtime statistics: > _resolve_invoke_virtual_cnt: 38770 > _resolve_invoke_opt_virtual_cnt: 186 > _resolve_invoke_static_cnt: 44 > _handle_wrong_method_cnt: 38695 > _ic_miss_cnt: 35 > > > With this patch, only 1 _handle_wrong_method_cnt is triggered but we have 3 more `_resolve_invoke_opt_virtual_cnt` events instead. The total runtime reduces from 736ms to 9ms. > > > $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 > Executed 10000 iterations in 9ms > C1 Runtime statistics: > _resolve_invoke_virtual_cnt: 77 > _resolve_invoke_opt_virtual_cnt: 189 > _resolve_invoke_static_cnt: 45 > _handle_wrong_method_cnt: 1 > _ic_miss_cnt: 39 > > > Codegen wise, before the patch, C1 generates LIR for the invokeinterface whose target is a private interface methoda as follows. > > __bci__use__tid____instr____________________________________ > . 1 0 v2 a1.invokeinterface() > InvokePrivateInterfaceMethod$I.bar()V > . 6 0 v3 return > > > With this patch, C1 generates LIR as follows. it first check a1 is a subtype of `InvokePrivateInterfaceMethod$I`. if so, an optimized virtual call is generated. The callsite will be fixed up once and only one time in runtime. > > __bci__use__tid____instr____________________________________ > . 1 1 a2 checkcast(a1) InvokePrivateInterfaceMethod$I > stack [0:a1] > . 1 0 v3 a2.invokeinterface() > InvokePrivateInterfaceMethod$I.bar()V > . 6 0 v4 return Xin Liu has updated the pull request incrementally with one additional commit since the last revision: Fix the whitespace issue of the microbenchmark. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6445/files - new: https://git.openjdk.java.net/jdk/pull/6445/files/56133c52..01e48da9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6445&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6445&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6445.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6445/head:pull/6445 PR: https://git.openjdk.java.net/jdk/pull/6445 From iveresov at openjdk.java.net Mon Nov 29 22:54:06 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Mon, 29 Nov 2021 22:54:06 GMT Subject: RFR: 8274983: C1 optimizes the invocation of private interface methods [v5] In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 21:39:39 GMT, Xin Liu wrote: >> The root cause of the C1 regression is that some regex generate multiple classes which all implement >> an interface. In SlowStartupTest.java, the following **invokeinterface** happens frequently with different receivers. the target is a private interface method. >> >> >> 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z >> >> >> This patch allows c1 to generate the optimized virtual call for invokeinterface >> whose targets are the private interface methods. >> >> Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private >> interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, >> LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because >> it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, >> it is possible that they trash the IC stub using their own concrete klass in runtime. >> >> Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM >> 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. >> Therefore, this patch can prevent the callsite from trashing. >> >> Before this patch, SlowStartupTest had 38770 times _resolve_invoke_virtual_cnt and 38695 _handle_wrong_method_cnt per 10k iterations. To dump `C1Statistics`, we use fastdebug build for comparison. >> >> >> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 >> Executed 10000 iterations in 736ms >> C1 Runtime statistics: >> _resolve_invoke_virtual_cnt: 38770 >> _resolve_invoke_opt_virtual_cnt: 186 >> _resolve_invoke_static_cnt: 44 >> _handle_wrong_method_cnt: 38695 >> _ic_miss_cnt: 35 >> >> >> With this patch, only 1 _handle_wrong_method_cnt is triggered but we have 3 more `_resolve_invoke_opt_virtual_cnt` events instead. The total runtime reduces from 736ms to 9ms. >> >> >> $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 >> Executed 10000 iterations in 9ms >> C1 Runtime statistics: >> _resolve_invoke_virtual_cnt: 77 >> _resolve_invoke_opt_virtual_cnt: 189 >> _resolve_invoke_static_cnt: 45 >> _handle_wrong_method_cnt: 1 >> _ic_miss_cnt: 39 >> >> >> Codegen wise, before the patch, C1 generates LIR for the invokeinterface whose target is a private interface methoda as follows. >> >> __bci__use__tid____instr____________________________________ >> . 1 0 v2 a1.invokeinterface() >> InvokePrivateInterfaceMethod$I.bar()V >> . 6 0 v3 return >> >> >> With this patch, C1 generates LIR as follows. it first check a1 is a subtype of `InvokePrivateInterfaceMethod$I`. if so, an optimized virtual call is generated. The callsite will be fixed up once and only one time in runtime. >> >> __bci__use__tid____instr____________________________________ >> . 1 1 a2 checkcast(a1) InvokePrivateInterfaceMethod$I >> stack [0:a1] >> . 1 0 v3 a2.invokeinterface() >> InvokePrivateInterfaceMethod$I.bar()V >> . 6 0 v4 return > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Fix the whitespace issue of the microbenchmark. Seems reasonable. ------------- Marked as reviewed by iveresov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6445 From jiefu at openjdk.java.net Mon Nov 29 23:26:10 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 29 Nov 2021 23:26:10 GMT Subject: RFR: 8277426: Optimize mask reduction operations on x86 [v3] In-Reply-To: References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Mon, 29 Nov 2021 21:07:08 GMT, Paul Sandoz wrote: > Latest tests are good. So let's integrate it. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From duke at openjdk.java.net Mon Nov 29 23:26:11 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Mon, 29 Nov 2021 23:26:11 GMT Subject: Integrated: 8277426: Optimize mask reduction operations on x86 In-Reply-To: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> References: <9g2nHSo1K9pldGKqKiXBqjFZF3UFqqKHu-6GLyKbclc=.496404e9-f682-470e-a93d-c8fc9ca6f626@github.com> Message-ID: On Thu, 18 Nov 2021 08:06:48 GMT, Mai ??ng Qu?n Anh wrote: > Hi, > > This patch improves the performance of mask reduction operations on AVX by matching the pattern `VectorMaskReduction (VectorStoreMask mask)` to eliminate the extra `VectorStoreMaskNode`. I have also done some refactoring to unify the logic of `toLong` with the other reduction operations. > > The patch has been discussed partially in [panama-vector repository](https://github.com/openjdk/panama-vector/pull/158). > > Thank you very much. This pull request has now been integrated. Changeset: 560f9c93 Author: MeryKitty Committer: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/560f9c937233d548ef8db8cd9044fdc6c4cefe41 Stats: 222 lines in 5 files changed: 142 ins; 29 del; 51 mod 8277426: Optimize mask reduction operations on x86 Reviewed-by: sviswanathan, jiefu ------------- PR: https://git.openjdk.java.net/jdk/pull/6447 From sviswanathan at openjdk.java.net Mon Nov 29 23:30:36 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 29 Nov 2021 23:30:36 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v5] In-Reply-To: References: Message-ID: <5Dmjboa1Vh9PwwZnwfiTpmrXSm0D7sNMMm92bufbGSM=.d8494927-8a16-4efb-84b7-15086809d13f@github.com> > Currently 32-byte instructions are used for small array copy and clear. > This can be optimized by using 64-byte instructions. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6512/files - new: https://git.openjdk.java.net/jdk/pull/6512/files/021bc659..b44b63ed Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=03-04 Stats: 17 lines in 3 files changed: 2 ins; 3 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/6512.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512 PR: https://git.openjdk.java.net/jdk/pull/6512 From jiefu at openjdk.java.net Mon Nov 29 23:31:10 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 29 Nov 2021 23:31:10 GMT Subject: RFR: 8277843: [Vector API] scalar2vector generates incorrect type info for mask operations if Op_MaskAll is unavailable [v2] In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 10:22:15 GMT, Tobias Hartmann wrote: > Looks reasonable to me. Thanks @TobiHartmann for your review. @jatin-bhateja , I had added you as one of the reviewers for this pr manually. And will push it today if there is no other comments. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6562 From jiefu at openjdk.java.net Mon Nov 29 23:43:08 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 29 Nov 2021 23:43:08 GMT Subject: RFR: 8277777: [Vector API] assert(r->is_XMMRegister()) failed: must be in x86_32.ad In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 09:48:26 GMT, Tobias Hartmann wrote: > That looks good to me but @jatin-bhateja should have a look as well. Thanks @TobiHartmann . @jatin-bhateja, please kindly review it. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6535 From sviswanathan at openjdk.java.net Tue Nov 30 00:10:39 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 30 Nov 2021 00:10:39 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6] In-Reply-To: References: Message-ID: > Currently 32-byte instructions are used for small array copy and clear. > This can be optimized by using 64-byte instructions. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Fix whitespace ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6512/files - new: https://git.openjdk.java.net/jdk/pull/6512/files/b44b63ed..190f974c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6512.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512 PR: https://git.openjdk.java.net/jdk/pull/6512 From dlong at openjdk.java.net Tue Nov 30 00:13:10 2021 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 30 Nov 2021 00:13:10 GMT Subject: RFR: 8275638: GraphKit::combine_exception_states fails with "matching stack sizes" assert [v2] In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 10:27:48 GMT, Roland Westrelin wrote: >> Root cause is identical to 8273165 AFIU: late inline of a virtual call >> can throw from 2 different paths (null check and the call >> itself). That breaks because the logic for exceptions expects the >> stack for all paths that throw exceptions to have the same stack size. >> >> AFAIU, the stack doesn't matter exception handling: either the >> exception is caught by a exception handler and then the stack is >> popped and the exception is pushed or, the exception is rethrown to >> the caller in which case the current stack is also popped (that is the >> jvm state for the current method). As a consequence the fix I propose >> is to ignore the stack in GraphKit::combine_exception_states(). >> >> AFAIU, the same fix would work for 8273165 but I left the current work >> around as is: not sure if we want to be conservative for now or not > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > make test runnable with release build If we want to be conservative, then perhaps we should do a bailout if the stack size don't match. Ignoring the stack in one place, but being careful to preserve it in others makes me think we are missing something. For example, this troubling comment from Parse::catch_inline_exceptions(): 911 // Start executing from the given throw state. (Keep its stack, for now.) I think we need a followup RFE to clean this all up once and for all. @vnkozlov, do we have someone who really understands what this exceptions code is doing in regards to stack sizes? ------------- PR: https://git.openjdk.java.net/jdk/pull/6572 From sviswanathan at openjdk.java.net Tue Nov 30 00:43:04 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 30 Nov 2021 00:43:04 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com> References: <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com> Message-ID: On Tue, 23 Nov 2021 06:49:07 GMT, David Holmes wrote: >> @dholmes-ora I have implemented your review comments. > > Sorry @sviswa7 but could you explain in the comment why/how `avx3_threshold` reporting zero impacts the use 64-byte load/store - the connection is not at all obvious for anyone not fully conversant with AVX3 and how it is used by the code. Thanks. @dholmes-ora @jatin-bhateja I have implemented your review comments. I have used the direct formulation for avx3_threshold() method as suggested by David. Reused the avx3_threshold() computation where possible as suggested by Jatin. The tier1-tier3 testing passed on the platform where avx3_threshold() returns 0. No additional observable overhead seen in SPECjvm2008 startup benchmarks on AVX512 platform. Please let me know if the patch looks ok to you. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From dlong at openjdk.java.net Tue Nov 30 01:31:15 2021 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 30 Nov 2021 01:31:15 GMT Subject: RFR: 8277882: New subnode ideal optimization: converting "c0 - (x + c1)" into "(c0 - c1) - x" [v2] In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 03:44:46 GMT, Zang, Zhiqiang wrote: >> Suggest two new optimizations that can be done in SubINode::Ideal. > > Zang, Zhiqiang has updated the pull request incrementally with one additional commit since the last revision: > > clean. Changes requested by dlong (Reviewer). I think your tests need to check more cases, especially cases that use MIN_VALUE, MAX_VALUE, or otherwise cause underflow or overflow. src/hotspot/share/opto/subnode.cpp line 197: > 195: jint c1 = phase->type(in2->in(2))->isa_int()->get_con(); > 196: return new SubINode(phase->intcon(java_subtract(c0, c1)), in2->in(1)); > 197: } Should this code be checking for counted loops like below (see ok_to_convert)? ------------- PR: https://git.openjdk.java.net/jdk/pull/6441 From jbhateja at openjdk.java.net Tue Nov 30 07:40:06 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 30 Nov 2021 07:40:06 GMT Subject: RFR: 8277843: [Vector API] scalar2vector generates incorrect type info for mask operations if Op_MaskAll is unavailable [v2] In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 23:28:12 GMT, Jie Fu wrote: > > Looks reasonable to me. > > Thanks @TobiHartmann for your review. > > @jatin-bhateja , I had added you as one of the reviewers for this pr manually. And will push it today if there is no other comments. Thanks. Thanks @DamonFool ------------- PR: https://git.openjdk.java.net/jdk/pull/6562 From jiefu at openjdk.java.net Tue Nov 30 08:36:09 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 30 Nov 2021 08:36:09 GMT Subject: Integrated: 8277843: [Vector API] scalar2vector generates incorrect type info for mask operations if Op_MaskAll is unavailable In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 14:41:27 GMT, Jie Fu wrote: > Hi all, > > This bug was first observed on x86_32/AVX512. > It caused 62 vector api test failures. > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR >>> jtreg:test/jdk/jdk/incubator/vector 74 12 62 0 << > ============================== > > > You can easily reproduce this bug on an AVX512 machine with x86_32. > Or you can also reproduce it on an AVX512 machine with x86_64 if you disable `Op_MaskAll` like this. > > diff --git a/src/hotspot/cpu/x86/x86.ad b/src/hotspot/cpu/x86/x86.ad > index 3f6d5a44b0d..d5a751b310d 100644 > --- a/src/hotspot/cpu/x86/x86.ad > +++ b/src/hotspot/cpu/x86/x86.ad > @@ -1819,6 +1819,7 @@ const bool Matcher::match_rule_supported_vector(int opcode, int vlen, BasicType > } > break; > case Op_MaskAll: > + return false; > if (!is_LP64 || !VM_Version::supports_evex()) { > return false; > } > > > The failure reason is that `VectorNode::scalar2vector` generate incorrect IR for mask operations if `Op_MaskAll` is unavailable. > So it shouldn't be used for mask operations if `Op_MaskAll` is unavailable. > > Testing (with two more bug fixes https://github.com/openjdk/jdk/pull/6535 and https://github.com/openjdk/jdk/pull/6533): > - vector api tests on {x86_64, x86_32}/{AVX512, AVX256}, all passed > - vector api tests on aarch64, all passed > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: ceae380d Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/ceae380d3a3fcef5678e3073e25eb37ca0a24c46 Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod 8277843: [Vector API] scalar2vector generates incorrect type info for mask operations if Op_MaskAll is unavailable Co-authored-by: Jatin Bhateja Reviewed-by: thartmann, jbhateja ------------- PR: https://git.openjdk.java.net/jdk/pull/6562 From thartmann at openjdk.java.net Tue Nov 30 08:40:05 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 30 Nov 2021 08:40:05 GMT Subject: RFR: JDK-8277178: Reduce the priority of data dependent nodes when OptoScheduling enabled In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 09:06:57 GMT, SUN Guoyun wrote: > when doing gcm/lcm, We should not only consider the height of nodes(latency), but also consider whether there is data dependency between nodes. When there is data dependency between two nodes and the delay of the previous node is large, another node without data dependency can be considered inserting between the two nodes. For example: > for java code >

>     public static final double fval = 2.00;
>     public static double[] A = new double[N];
>     public static int[] B = new int[N];
> 
>     public static void testP(){
> 	for (int i=0; i 	   A[i] += A[i] * fval;
> 	   B[i] += B[i]+2;
>         }
>     }
> 
> > when use `-XX:+OptoScheduling` in aarch64, the sequence is >

> 190     B15: #	out( B15 B16 ) <- in( B14 B15 ) Loop( B15-B15 inner main of N118 strip mined) Freq: 9.9999e+11
> 190     sxtw  R13, R15	# i2l
> 194 +   add R14, R17, R13, LShiftL #3	# ptr
> 198     ldrd  V16, [R14, #16]	# double
> 19c +   fmuld   V18, V16, V17
> 1a0 +   faddd   V16, V18, V16
> 1a4     strd  V16, [R14, #16]	# double
> 1a8 +   add R13, R0, R13, LShiftL #2	# ptr
> 1ac +   ldrw  R1, [R13, #16]	# int
> 1b0 +   addw  R14, R1, R1
> 1b4 +   addw R1, R14, #2
> 1b8 +   addw R15, R15, #1
> 1bc     strw  R1, [R13, #16]	# int
> 1c0 +   cmpw  R15, R12
> 1c4     blt B15 	// counted loop end  P=1.000000 C=40960.000000
> 
> > Then a more efficient sequence should be: >

> 190     B15: #	out( B15 B16 ) <- in( B14 B15 ) Loop( B15-B15 inner main of N118 strip mined) Freq: 9.9999e+11
> 190     sxtw  R13, R14	# i2l
> 194     add R15, R17, R13, LShiftL #3	# ptr
> 198     add R13, R0, R13, LShiftL #2	# ptr
> 19c     ldrd  V16, [R15, #16]	# double
> 1a0     ldrw  R2, [R13, #16]	# int
> 1a4     fmuld   V18, V16, V17
> 1a8     addw  R1, R2, R2
> 1ac     faddd   V16, V18, V16
> 1b0     strd  V16, [R15, #16]	# double
> 1b4     addw R1, R1, #2
> 1b8     strw  R1, [R13, #16]	# int
> 1bc     addw R14, R14, #1
> 1c0     cmpw  R14, R12
> 1c4     blt B15 	// counted loop end  P=1.000000 C=40960.000000
> 
> > This problem also exists in MIPS architecture. This is a patch to fix this problem. Please help review it. > Thanks I'm not too familiar with this code but I gave it a quick run through our performance testing and all results look good except for the MonteCarlo benchmark from SPECjvm2008 with G1 which shows a 1% regression. It could just be run-to-run variance but better double check. Also, I think it would be good to add a JMH benchmark for this. ------------- PR: https://git.openjdk.java.net/jdk/pull/6407 From roland at openjdk.java.net Tue Nov 30 08:45:06 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 30 Nov 2021 08:45:06 GMT Subject: RFR: 8275638: GraphKit::combine_exception_states fails with "matching stack sizes" assert [v2] In-Reply-To: References: Message-ID: <0GbtDPLsXoyXYNJv2ZJ6vwTSdbzJOXqWUNENCTCrZmA=.d0b24bd8-be69-435d-8db1-d291afcd7f62@github.com> On Tue, 30 Nov 2021 00:10:30 GMT, Dean Long wrote: > If we want to be conservative, then perhaps we should do a bailout if the stack size don't match. Ignoring the stack in one place, but being careful to preserve it in others makes me think we are missing something. For example, this troubling comment from Parse::catch_inline_exceptions(): > > 911 // Start executing from the given throw state. (Keep its stack, for now.) If we're missing something, wouldn't testing catch it? Wouldn't running that patch through extensive testing help then? ------------- PR: https://git.openjdk.java.net/jdk/pull/6572 From roland at openjdk.java.net Tue Nov 30 08:48:20 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 30 Nov 2021 08:48:20 GMT Subject: RFR: 8262341: Refine identical code in AddI/LNode. Message-ID: <5q7zq7qSEn7iTQdlYBh3d92Qeg7oxO_b5T36iUaFSGs=.326c2ded-2a41-4589-8951-6a31a6d44257@github.com> AddINode::Ideal() and AddlNode::Ideal() are almost identical but the same logic had to be duplicated because AddINode::Ideal() tests its inputs for Op_AddI, Op_SubI etc. while AddLNode::Ideal() tests for Op_AddL, Op_SubL etc. This patch refactors the code so the common logic is in a single method parameterized by a BasicType argument. The way I've done this before in the context of int/long counted loops was to use and extra virtual method operates_on(). So: n->Opcode() == Op_AddI becomes n->is_Add() && n->operates_on(T_INT) Working on this change made me realize that pattern doesn't work that well: - it's quite a bit more verbose and converting existing code is not as mechanical as we would like to avoid conversion errors. - it breaks when a class has a subclass. For instance AddNode has OrINode and OrLNode as subclasses so testing for n->is_Add() returns true with an OrI node. Instead, this change introduces new functions. For instance of AddI/AddL: int Op_Add(BasicType bt) that returns either Op_AddI or Op_AddL depending on bt. This made refactoring the AddINode::Ideal() logic straightforward. I removed all use of operates_on() as well and converted existing code to the new Op_XXX() functions. ------------- Commit messages: - fix Changes: https://git.openjdk.java.net/jdk/pull/6607/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6607&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262341 Stats: 433 lines in 14 files changed: 120 ins; 249 del; 64 mod Patch: https://git.openjdk.java.net/jdk/pull/6607.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6607/head:pull/6607 PR: https://git.openjdk.java.net/jdk/pull/6607 From mdoerr at openjdk.java.net Tue Nov 30 09:25:15 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 30 Nov 2021 09:25:15 GMT Subject: Integrated: 8277846: Implement fast-path for ASCII-compatible CharsetEncoders on ppc64 In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 15:33:30 GMT, Martin Doerr wrote: > PPC64 port of 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 > I moved the `encode_iso_array` implementation into `C2_MacroAssembler` and reused it for the new ASCII node. The algorithm is unchanged. We only need to change the mask because (non-extended) ASCII uses 7 bit (also see x86 implementation). > > 'micro:CharsetEncodeDecode' benchmark results without intrinsic on Power9: > > > Benchmark (size) (type) Mode Cnt Score Error Units > CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 14.530 ? 0.012 us/op > CharsetEncodeDecode.encode 16384 BIG5 avgt 30 15.359 ? 0.014 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 14.256 ? 0.019 us/op > CharsetEncodeDecode.encode 16384 ASCII avgt 30 14.237 ? 0.012 us/op > > > With intrinsic: > > > Benchmark (size) (type) Mode Cnt Score Error Units > CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 5.085 ? 0.016 us/op > CharsetEncodeDecode.encode 16384 BIG5 avgt 30 5.905 ? 0.023 us/op > CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 4.795 ? 0.023 us/op > CharsetEncodeDecode.encode 16384 ASCII avgt 30 4.798 ? 0.013 us/op This pull request has now been integrated. Changeset: a5f2a58b Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/a5f2a58ba4ac25f4bd66f1f1f4c036a4f0024229 Stats: 78 lines in 4 files changed: 46 ins; 14 del; 18 mod 8277846: Implement fast-path for ASCII-compatible CharsetEncoders on ppc64 Reviewed-by: lucy, mbaesken ------------- PR: https://git.openjdk.java.net/jdk/pull/6565 From smonteith at openjdk.java.net Tue Nov 30 11:41:14 2021 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Tue, 30 Nov 2021 11:41:14 GMT Subject: RFR: 8277893: Arraycopy stress tests In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 13:28:33 GMT, Aleksey Shipilev wrote: > I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. > > A brief tour of these tests: > > - Tests all data types; > - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; > - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; > - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; > - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; > > My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. > > Test times: > > > # x86_64 (TR 3970X) > real 9m11.037s > user 78m2.766s > sys 0m19.873s > > # x86_32 (TR 3970X) > real 13m39.054s > user 147m38.308s > sys 0m10.924s > > # x86_64 (i5-11500) > real 41m32.622s > user 447m19.986s > sys 0m21.026s > > # AArch64 (ThunderX2) > real 5m34.210s > user 45m16.015s > sys 0m24.723s > > > Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. > > Additional testing: > - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` > - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` > - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` This looks great, thanks Aleksey. This covers all of the cases I'd reasonably expect to see covered. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From yyang at openjdk.java.net Tue Nov 30 11:54:15 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 30 Nov 2021 11:54:15 GMT Subject: RFR: 8267928: Loop predicate gets inexact loop limit before PhaseIdealLoop::rc_predicate In-Reply-To: References: Message-ID: On Fri, 28 May 2021 13:29:36 GMT, Yi Yang wrote: > Loop predicate gets inexact loop limit(LoopLimitNode) from exact_limit(even if the limit is statically known) and does unnecessary overflow checking when generating lower bound test(rc_predicate). The reason is rather straightforward: exact_limit fails to see a HasExactTripCount flag since it would be set after performing loop predicate(iteration_split). Thanks all for reviewing this PR! ------------- PR: https://git.openjdk.java.net/jdk/pull/4247 From yyang at openjdk.java.net Tue Nov 30 11:54:15 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 30 Nov 2021 11:54:15 GMT Subject: Integrated: 8267928: Loop predicate gets inexact loop limit before PhaseIdealLoop::rc_predicate In-Reply-To: References: Message-ID: On Fri, 28 May 2021 13:29:36 GMT, Yi Yang wrote: > Loop predicate gets inexact loop limit(LoopLimitNode) from exact_limit(even if the limit is statically known) and does unnecessary overflow checking when generating lower bound test(rc_predicate). The reason is rather straightforward: exact_limit fails to see a HasExactTripCount flag since it would be set after performing loop predicate(iteration_split). This pull request has now been integrated. Changeset: fecf906f Author: Yi Yang URL: https://git.openjdk.java.net/jdk/commit/fecf906f0af9ddc0e83cb681845009f34555d5dc Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8267928: Loop predicate gets inexact loop limit before PhaseIdealLoop::rc_predicate Reviewed-by: thartmann, chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/4247 From jvernee at openjdk.java.net Tue Nov 30 12:41:11 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 30 Nov 2021 12:41:11 GMT Subject: RFR: 8277602: Deopt code does not extend the stack enough if the caller is an optimize entry blob [v2] In-Reply-To: References: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> Message-ID: On Thu, 25 Nov 2021 18:48:45 GMT, Jorn Vernee wrote: >> Deoptimization code does not recreate c2i adapter 'frames'. For compiled callers this means that the stack needs to be adjusted manually to make room for the parameters when the callee is converted to an interpreter frame (essentially emulating what a c2i adapter would do). >> >> To check if the caller does a compiled call, the current code uses `frame::is_compiled_frame()`, which is true if the codeblob of the caller frame is an instance of `CompiledMethod`. >> >> However, optimized entry blobs also do compiled calls, are not detected by this test, and therefore don't get their stack adjusted correctly. >> >> To address this, I've added a new `frame::is_compiled_caller` function to determine if the caller is doing a compiled call, and I use that in the deopt code instead of `is_compiled_frame`. >> >> This patch also removes an old workaround that tried to fix the issue by allocating some spill space in the optimized entry blob frame, but this only accounts for the first argument. If there are more arguments we still have a problem. The suggested patch fixes this the right way I think. >> >> Thanks, >> Jorn >> >> Testing: run-test-jdk_foreign on Windows x64 and Linux x64 (afaik these are the only tests that use optimized entry blobs) > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Fix after merge > - Merge branch 'master' into Deopt_Stack_Fix > - Add test + asserts > - Properly handle optimized entry frame callers during deopt Tier4-5 came back clean, except for some failures due to known issues. Will go ahead an integrate this. ------------- PR: https://git.openjdk.java.net/jdk/pull/6522 From duke at openjdk.java.net Tue Nov 30 14:20:34 2021 From: duke at openjdk.java.net (Ludvig Janiuk) Date: Tue, 30 Nov 2021 14:20:34 GMT Subject: RFR: JDK-8277496 Remove duplication in c1 Block successor lists Message-ID: <4wdDk_HKbqPRXF47C895FeMhJpmrT6rt0aVT8q0miH0=.7b4bcdd0-671a-409b-ad6f-e433a3507e24@github.com> Remove `BlockBegin::_successors`, leaving `BlockEnd::_sux` as the SSOT for the successors of a block. Prior to this PR, these two lists were both tracking the same list of successors of the same block. This necessitated a lot of syncing and verification code. With this PR, as long as a block has its end pointer assigned, its successors can always be reached by querying the `BlockEnd`. `BlockEnd::_sux` becomes the single place where the list of successors is maintained. When modified, the successor list no longer needs to be synchronized in two places, reducing complexity and confusion. Asserts on the two lists corresponding no longer need to be made. While being created in `GraphBuilder`, `BlockBegin`s don't have a `BlockEnd` assigned yet. To temporarily track block successors in this small interval, add a lookup structure `BlockListBuilder::_bci2block_successors`. This PR affects debug printing code. If the end pointer of a `BlockBegin `is NULL for some reason, then the successor list can no longer be printed (for obious reasons). This PR introduces an additional check to IR::verify to check that `BlockBegin::_end` is not set to null. This PR also performs some minor refactoring, polishing, inlining, and removing of dead code around the affected areas. The commit history has been polished to attempt to guide the reader through the changes. hs-tier1 and hs-tier2 tests pass. ------------- Commit messages: - extract disconnect_from_graph - rm duplicated printing - add is_successor - comment on end == null in BlockListBuilder() - CFGPrinter don't print sux when end is NULL - Remove extraneous assertions in build_hir - IR verify that ends aren't null - simplify blockmerger - remove dead code - improve set_end - ... and 18 more: https://git.openjdk.java.net/jdk/compare/a5f2a58b...284cc91a Changes: https://git.openjdk.java.net/jdk/pull/6614/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6614&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277496 Stats: 198 lines in 8 files changed: 94 ins; 73 del; 31 mod Patch: https://git.openjdk.java.net/jdk/pull/6614.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6614/head:pull/6614 PR: https://git.openjdk.java.net/jdk/pull/6614 From jvernee at openjdk.java.net Tue Nov 30 14:37:15 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 30 Nov 2021 14:37:15 GMT Subject: Integrated: 8277602: Deopt code does not extend the stack enough if the caller is an optimize entry blob In-Reply-To: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> References: <515lMydWeiRbYVwp7x4kW33eiOBrMdoobOaO4wW79TE=.6655f6f7-16eb-413d-a958-7225b0ce0b7a@github.com> Message-ID: On Tue, 23 Nov 2021 15:01:24 GMT, Jorn Vernee wrote: > Deoptimization code does not recreate c2i adapter 'frames'. For compiled callers this means that the stack needs to be adjusted manually to make room for the parameters when the callee is converted to an interpreter frame (essentially emulating what a c2i adapter would do). > > To check if the caller does a compiled call, the current code uses `frame::is_compiled_frame()`, which is true if the codeblob of the caller frame is an instance of `CompiledMethod`. > > However, optimized entry blobs also do compiled calls, are not detected by this test, and therefore don't get their stack adjusted correctly. > > To address this, I've added a new `frame::is_compiled_caller` function to determine if the caller is doing a compiled call, and I use that in the deopt code instead of `is_compiled_frame`. > > This patch also removes an old workaround that tried to fix the issue by allocating some spill space in the optimized entry blob frame, but this only accounts for the first argument. If there are more arguments we still have a problem. The suggested patch fixes this the right way I think. > > Thanks, > Jorn > > Testing: run-test-jdk_foreign on Windows x64 and Linux x64 (afaik these are the only tests that use optimized entry blobs) This pull request has now been integrated. Changeset: 98a9f037 Author: Jorn Vernee URL: https://git.openjdk.java.net/jdk/commit/98a9f037397d437d2c3221e8522ed8ab397a457a Stats: 131 lines in 6 files changed: 119 ins; 8 del; 4 mod 8277602: Deopt code does not extend the stack enough if the caller is an optimize entry blob Reviewed-by: dlong, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6522 From rkennke at openjdk.java.net Tue Nov 30 15:19:38 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 30 Nov 2021 15:19:38 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v7] In-Reply-To: References: Message-ID: > The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. > I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. > > The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [ ] tier4 Roman Kennke has updated the pull request incrementally with five additional commits since the last revision: - Implement x86_32 parts - Run UseHeavyMonitors test only on x86_64, add VerifyHeavyMonitors option - Fix indentation - Verify flags consistency - Add (Deprecated) to UseHeavyMonitors flag description ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6320/files - new: https://git.openjdk.java.net/jdk/pull/6320/files/eca00517..4cba9d10 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=05-06 Stats: 78 lines in 6 files changed: 25 ins; 0 del; 53 mod Patch: https://git.openjdk.java.net/jdk/pull/6320.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320 PR: https://git.openjdk.java.net/jdk/pull/6320 From rkennke at openjdk.java.net Tue Nov 30 15:30:43 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 30 Nov 2021 15:30:43 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v8] In-Reply-To: References: Message-ID: > The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. > I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. > > The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [ ] tier4 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Check RTM/HeavyMonitors only on X86 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6320/files - new: https://git.openjdk.java.net/jdk/pull/6320/files/4cba9d10..12250091 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=06-07 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6320.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320 PR: https://git.openjdk.java.net/jdk/pull/6320 From neliasso at openjdk.java.net Tue Nov 30 15:57:03 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 30 Nov 2021 15:57:03 GMT Subject: RFR: JDK-8277496 Remove duplication in c1 Block successor lists In-Reply-To: <4wdDk_HKbqPRXF47C895FeMhJpmrT6rt0aVT8q0miH0=.7b4bcdd0-671a-409b-ad6f-e433a3507e24@github.com> References: <4wdDk_HKbqPRXF47C895FeMhJpmrT6rt0aVT8q0miH0=.7b4bcdd0-671a-409b-ad6f-e433a3507e24@github.com> Message-ID: <2qmMSA8n_RWx9P5NhUOZUOzEhjJgeMgItBMMt_jC_aI=.56ae7470-a415-4506-a0dc-206ffaffb2c3@github.com> On Tue, 30 Nov 2021 14:14:36 GMT, Ludvig Janiuk wrote: > Remove `BlockBegin::_successors`, leaving `BlockEnd::_sux` as the SSOT for the successors of a block. Prior to this PR, these two lists were both tracking the same list of successors of the same block. This necessitated a lot of syncing and verification code. > > With this PR, as long as a block has its end pointer assigned, its successors can always be reached by querying the `BlockEnd`. `BlockEnd::_sux` becomes the single place where the list of successors is maintained. When modified, the successor list no longer needs to be synchronized in two places, reducing complexity and confusion. Asserts on the two lists corresponding no longer need to be made. > > While being created in `GraphBuilder`, `BlockBegin`s don't have a `BlockEnd` assigned yet. To temporarily track block successors in this small interval, add a lookup structure `BlockListBuilder::_bci2block_successors`. > > This PR affects debug printing code. If the end pointer of a `BlockBegin `is NULL for some reason, then the successor list can no longer be printed (for obious reasons). > > This PR introduces an additional check to IR::verify to check that `BlockBegin::_end` is not set to null. > > This PR also performs some minor refactoring, polishing, inlining, and removing of dead code around the affected areas. > > The commit history has been polished to attempt to guide the reader through the changes. > > hs-tier1 and hs-tier2 tests pass. A very nice clean up! src/hotspot/share/c1/c1_GraphBuilder.cpp line 3397: > 3395: #endif > 3396: > 3397: // JANIUK: If we iterate all the blocks in _blocks, some of them have end NULL. Left over comment? ------------- PR: https://git.openjdk.java.net/jdk/pull/6614 From chagedorn at openjdk.java.net Tue Nov 30 16:29:10 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 30 Nov 2021 16:29:10 GMT Subject: RFR: 8275326: C2: assert(no_dead_loop) failed: dead loop detected [v4] In-Reply-To: <8ZBmN3KXGkCgYPCmtrkM1lYGopX_j9nc4pN21w5s4CA=.1cc50fcc-bec2-4553-8aca-7de9b04b5120@github.com> References: <8ZBmN3KXGkCgYPCmtrkM1lYGopX_j9nc4pN21w5s4CA=.1cc50fcc-bec2-4553-8aca-7de9b04b5120@github.com> Message-ID: On Mon, 29 Nov 2021 11:23:35 GMT, Christian Hagedorn wrote: >> In the test case, we apply the following optimization in `PhiNode::Ideal()` for the memory phi 989 that is on a dead path but still has both its inputs set to non-top nodes: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2269-L2270 >> ![Screenshot from 2021-11-05 11-57-49](https://user-images.githubusercontent.com/17833009/140502849-9f00fd62-9714-4f54-8f98-f22f74d11430.png) >> >> In this process, we create `11853 Phi` for the new `11850 MergeMem` which is going to replace `989 Phi` (`this`). We then transform `11853 Phi` before returning: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2314 >> >> During `Ideal()` for `11853 Phi`, we transform `11769 MergeMem` into top (because the base memory is top) and use this as new input instead: >> https://github.com/openjdk/jdk/blob/3c0faa73522bd004b66cb9e477f43e15a29842e6/src/hotspot/share/opto/cfgnode.cpp#L2230-L2240 >> >> But even if the `MergeMem` node would not be transformed into top, the slice itself could be top (L2237) and we would still replace the phi input with top. This replacement by top will fold the `11853 Phi` and we will build a cycle `11850 MergeMem` <-> `1064 StoreB` because `989 Phi` will be replaced by `11850 MergeMem`. This results in the assertion failure. >> >> I tried some approaches by marking `11853 Phi` and/or `989 Phi` to specially treat them during the optimizations in `Ideal()` (e.g. skipping `989 Phi` during the dead loop detection etc.) or to improve the dead loop detection before applying the `MergeMem` optimization in `Ideal()`. But that seemed rather complicated/fragile. >> >> I therefore propose to simply not transform the newly created phi nodes directly but wait instead for IGVN to revisit them again. This allows the `this` phi to be replaced with the new `MergeMem` node and the dead loop detection will work correctly when processing the new phis again later in IGVN. >> >> I could only reproduce this bug with the replay file for the attached test case in the JBS issue. The test case itself did not trigger with repeated runs with `StressIGVN` + `RepeatCompilation`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Remove igvn checks Thanks Vladimir for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6276 From sviswanathan at openjdk.java.net Tue Nov 30 16:47:04 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 30 Nov 2021 16:47:04 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 00:10:39 GMT, Sandhya Viswanathan wrote: >> Currently 32-byte instructions are used for small array copy and clear. >> This can be optimized by using 64-byte instructions. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace @neliasso Could you please also review this small patch. I would like to get it integrated before JDK 18 feature freeze. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From rkennke at openjdk.java.net Tue Nov 30 17:22:35 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 30 Nov 2021 17:22:35 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v9] In-Reply-To: References: Message-ID: > The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. > I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. > > The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [ ] tier4 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Implement aarch64 support ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6320/files - new: https://git.openjdk.java.net/jdk/pull/6320/files/12250091..d1ec5b65 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=07-08 Stats: 106 lines in 5 files changed: 22 ins; 2 del; 82 mod Patch: https://git.openjdk.java.net/jdk/pull/6320.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320 PR: https://git.openjdk.java.net/jdk/pull/6320 From rkennke at openjdk.java.net Tue Nov 30 17:27:05 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 30 Nov 2021 17:27:05 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v4] In-Reply-To: <7GuyRoJ653qrQDv-vEnRU7JMcZU6qZJi0j7Ty1b5PE4=.c7d00b3d-c23b-47f9-bfb6-258623c2faae@github.com> References: <7GuyRoJ653qrQDv-vEnRU7JMcZU6qZJi0j7Ty1b5PE4=.c7d00b3d-c23b-47f9-bfb6-258623c2faae@github.com> Message-ID: On Thu, 18 Nov 2021 06:47:23 GMT, David Holmes wrote: > HI Roman, > > I have a number of initial comments/suggestions/requests - see below. > > IIUC you are only making UseHeavyMonitors work properly on x86_64, but in that case you cannot convert UseFastLocks to UseHeavyMonitors on all platforms as it won't work correctly on those other platforms. > > Cheers, David It would not break as such on other platforms. It would only be partially implemented, that is C1 would emit calls to runtime for and only use monitors while interpreter and C2 would still emit stack locks. That is ok - and that is roughly what +UseFastLocking used to do. ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From rkennke at openjdk.java.net Tue Nov 30 17:27:06 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 30 Nov 2021 17:27:06 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v9] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 17:22:35 GMT, Roman Kennke wrote: >> The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. >> I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. >> >> The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [ ] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Implement aarch64 support I have addressed the suggested changes, and also implemented (and verified) the flag on x86_32 and aarch64. I don't have access to the remaining CPU arches. Can I please get another round of reviews? Thanks, Roman ------------- PR: https://git.openjdk.java.net/jdk/pull/6320 From dlong at openjdk.java.net Tue Nov 30 18:52:07 2021 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 30 Nov 2021 18:52:07 GMT Subject: RFR: 8275638: GraphKit::combine_exception_states fails with "matching stack sizes" assert [v2] In-Reply-To: <0GbtDPLsXoyXYNJv2ZJ6vwTSdbzJOXqWUNENCTCrZmA=.d0b24bd8-be69-435d-8db1-d291afcd7f62@github.com> References: <0GbtDPLsXoyXYNJv2ZJ6vwTSdbzJOXqWUNENCTCrZmA=.d0b24bd8-be69-435d-8db1-d291afcd7f62@github.com> Message-ID: On Tue, 30 Nov 2021 08:42:21 GMT, Roland Westrelin wrote: > If we're missing something, wouldn't testing catch it? Wouldn't running that patch through extensive testing help then? Normally, I would say yes, but my impression is that we don't have great test coverage for this issue, and I'm concerned that this close to the RDP1 date we should go with the most conservative fix. Out of curiosity, I checked when the "Keep its stack, for now" comment was added, and it was for JDK-4432078. The comment in the bug says: "Missing stack contents in graphkit-based exceptions make it impossible to re-run the trapping bytecode in the interpreter. Fix is to retain stack information a little longer." which again brings up my concern that this exception information may be used for deoptimization. Prior to JDK-4432078, it appears that we did truncate the stack size before pushing the exception object, and that apparently led to problems. Given this new information, this change makes me especially nervous: - // Skip everything in the JVMS after tos. (The ex_oop follows.) - if (i == tos) i = ex_jvms->monoff(); + // Skip everything in the JVMS after the stack (included). (The ex_oop follows.) + if (i == ex_jvms->stkoff()) i = ex_jvms->monoff(); ------------- PR: https://git.openjdk.java.net/jdk/pull/6572 From xliu at openjdk.java.net Tue Nov 30 18:59:12 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 30 Nov 2021 18:59:12 GMT Subject: Integrated: 8274983: C1 optimizes the invocation of private interface methods In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 06:13:31 GMT, Xin Liu wrote: > The root cause of the C1 regression is that some regex generate multiple classes which all implement > an interface. In SlowStartupTest.java, the following **invokeinterface** happens frequently with different receivers. the target is a private interface method. > > > 9: invokeinterface #25, 3 // InterfaceMethod java/util/regex/Pattern$BmpCharPredicate.lambda$union$2:(Ljava/util/regex/Pattern$CharPredicate;I)Z > > > This patch allows c1 to generate the optimized virtual call for invokeinterface > whose targets are the private interface methods. > > Before JDK-823835, LambdaMetaFactory generates invokespecial in this case. Because the private > interface methods can not be overrided, c1 generates the optimized virtual call. After JDK-823835, > LambdaMetaFactory generates invokeinterface instead. C1 generates the regular virtual call because > it can not recognize the new pattern. If a multiple of subclasses all implement a same interface, > it is possible that they trash the IC stub using their own concrete klass in runtime. > > Optimized virtual call uses relocInfo::opt_virtual_call_type(3), It will call VM > 'resolve_opt_virtual_call_C' once and resolve the target to the VEP of the nmethod. > Therefore, this patch can prevent the callsite from trashing. > > Before this patch, SlowStartupTest had 38770 times _resolve_invoke_virtual_cnt and 38695 _handle_wrong_method_cnt per 10k iterations. To dump `C1Statistics`, we use fastdebug build for comparison. > > > $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 > Executed 10000 iterations in 736ms > C1 Runtime statistics: > _resolve_invoke_virtual_cnt: 38770 > _resolve_invoke_opt_virtual_cnt: 186 > _resolve_invoke_static_cnt: 44 > _handle_wrong_method_cnt: 38695 > _ic_miss_cnt: 35 > > > With this patch, only 1 _handle_wrong_method_cnt is triggered but we have 3 more `_resolve_invoke_opt_virtual_cnt` events instead. The total runtime reduces from 736ms to 9ms. > > > $java -XX:TieredStopAtLevel=1 -XX:+PrintC1Statistics SlowStartupTest 1 > Executed 10000 iterations in 9ms > C1 Runtime statistics: > _resolve_invoke_virtual_cnt: 77 > _resolve_invoke_opt_virtual_cnt: 189 > _resolve_invoke_static_cnt: 45 > _handle_wrong_method_cnt: 1 > _ic_miss_cnt: 39 > > > Codegen wise, before the patch, C1 generates LIR for the invokeinterface whose target is a private interface methoda as follows. > > __bci__use__tid____instr____________________________________ > . 1 0 v2 a1.invokeinterface() > InvokePrivateInterfaceMethod$I.bar()V > . 6 0 v3 return > > > With this patch, C1 generates LIR as follows. it first check a1 is a subtype of `InvokePrivateInterfaceMethod$I`. if so, an optimized virtual call is generated. The callsite will be fixed up once and only one time in runtime. > > __bci__use__tid____instr____________________________________ > . 1 1 a2 checkcast(a1) InvokePrivateInterfaceMethod$I > stack [0:a1] > . 1 0 v3 a2.invokeinterface() > InvokePrivateInterfaceMethod$I.bar()V > . 6 0 v4 return This pull request has now been integrated. Changeset: 21d9ca6c Author: Xin Liu URL: https://git.openjdk.java.net/jdk/commit/21d9ca6cd942ac98a3be2577ded8eaf92dac7d46 Stats: 119 lines in 2 files changed: 107 ins; 12 del; 0 mod 8274983: C1 optimizes the invocation of private interface methods Reviewed-by: dlong, iveresov ------------- PR: https://git.openjdk.java.net/jdk/pull/6445 From kvn at openjdk.java.net Tue Nov 30 19:29:07 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 30 Nov 2021 19:29:07 GMT Subject: RFR: 8277893: Arraycopy stress tests In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 13:28:33 GMT, Aleksey Shipilev wrote: > I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. > > A brief tour of these tests: > > - Tests all data types; > - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; > - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; > - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; > - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; > > My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. > > Test times: > > > # x86_64 (TR 3970X) > real 9m11.037s > user 78m2.766s > sys 0m19.873s > > # x86_32 (TR 3970X) > real 13m39.054s > user 147m38.308s > sys 0m10.924s > > # x86_64 (i5-11500) > real 41m32.622s > user 447m19.986s > sys 0m21.026s > > # AArch64 (ThunderX2) > real 5m34.210s > user 45m16.015s > sys 0m24.723s > > > Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. > > Additional testing: > - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` > - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` > - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` I assume that `test/micro/org/openjdk/bench/java/lang` micros cover all these cases. Otherwise you may need to add some. test/hotspot/jtreg/TEST.groups line 183: > 181: > 182: tier3_compiler = \ > 183: compiler/arraycopy/stress Can you introduce separate group for this? For example `hotspot_arraycopy_stress` and use it here. I am fine with introduced `tier2|3_compiler` groups but it will help us in Oracle to have separate group for `arraycopy` so we can schedule its testing on proper machines. test/hotspot/jtreg/compiler/arraycopy/stress/AbstractStressArrayCopy.java line 32: > 30: * Max array size to test. > 31: */ > 32: static final int MAX_SIZE = 1024*1024 + 1; Do we really need such big arrays for regression testing. It may make sense for JMH but not for these tests I think. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From shade at openjdk.java.net Tue Nov 30 20:29:05 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 30 Nov 2021 20:29:05 GMT Subject: RFR: 8277893: Arraycopy stress tests In-Reply-To: References: Message-ID: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> On Tue, 30 Nov 2021 19:25:41 GMT, Vladimir Kozlov wrote: > I assume that `test/micro/org/openjdk/bench/java/lang` micros cover all these cases. Otherwise you may need to add some. Yes. Performance tests will come separately. This PR covers purely functional tests that verify arraycopies are not foobar-ing array contents, not hitting any asserts, or otherwise crash VMs. Performance tests would run on a limited set of inputs and in `release` bits, so they are bad for verification like this :) > test/hotspot/jtreg/TEST.groups line 183: > >> 181: >> 182: tier3_compiler = \ >> 183: compiler/arraycopy/stress > > Can you introduce separate group for this? For example `hotspot_arraycopy_stress` and use it here. > I am fine with introduced `tier2|3_compiler` groups but it will help us in Oracle to have separate group for `arraycopy` so we can schedule its testing on proper machines. Yes, we can. Actually, working on #6622, I realized these test groups would be introduced anyway. So these new arraycopy tests should probably go to `hotspot_slow_compiler` group, along with other `stress` tests. This would hook arraycopy tests into `hotspot:tier3` automatically if #6622 lands. Tell me if you still want a completely separate test group, or `hotspot_slow_compiler` is enough for current Oracle testing infra. > test/hotspot/jtreg/compiler/arraycopy/stress/AbstractStressArrayCopy.java line 32: > >> 30: * Max array size to test. >> 31: */ >> 32: static final int MAX_SIZE = 1024*1024 + 1; > > Do we really need such big arrays for regression testing. It may make sense for JMH but not for these tests I think. My original intent was to make sure the tests cross all small page sizes (up to 64K) and maybe even some large page sizes (1M `long[]` is 8M, so 2*4M). The size of this array does not matter for test performance very much, since we only allocate two `MAX_SIZE`-d arrays per entire run. Driver even caps the heap size at `-Xmx256m` to block tests from using too much memory. So, I'd leave it at 1M, if you agree. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From kvn at openjdk.java.net Tue Nov 30 20:39:10 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 30 Nov 2021 20:39:10 GMT Subject: RFR: 8277893: Arraycopy stress tests In-Reply-To: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> References: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> Message-ID: On Tue, 30 Nov 2021 20:21:19 GMT, Aleksey Shipilev wrote: >> test/hotspot/jtreg/compiler/arraycopy/stress/AbstractStressArrayCopy.java line 32: >> >>> 30: * Max array size to test. >>> 31: */ >>> 32: static final int MAX_SIZE = 1024*1024 + 1; >> >> Do we really need such big arrays for regression testing. It may make sense for JMH but not for these tests I think. > > My original intent was to make sure the tests cross all small page sizes (up to 64K) and maybe even some large page sizes (1M `long[]` is 8M, so 2*4M). The size of this array does not matter for test performance very much, since we only allocate two `MAX_SIZE`-d arrays per entire run. Driver even caps the heap size at `-Xmx256m` to block tests from using too much memory. So, I'd leave it at 1M, if you agree. Okay. I was concern because of times you show. I am fine with running tests upto 10-15 mins but not this: # x86_64 (i5-11500) real 41m32.622s user 447m19.986s sys 0m21.026s Do you know why it takes so much time on it? ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From shade at openjdk.java.net Tue Nov 30 20:44:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 30 Nov 2021 20:44:43 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2] In-Reply-To: References: Message-ID: > I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there. > > Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features. > > We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3. > > Sample times for new subgroups (think about this as "How much time they add to existing tiers"): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier2_compiler 243 243 0 0 > ============================== > > real 2m16.518s > user 35m40.839s > sys 1m35.334s > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier3_compiler 132 132 0 0 > ============================== > > real 4m31.935s > user 71m54.617s > sys 2m13.073s Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Filter out tier1/2 groups too ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6622/files - new: https://git.openjdk.java.net/jdk/pull/6622/files/d027cbe0..3a15f32b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6622&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6622&range=00-01 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6622.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6622/head:pull/6622 PR: https://git.openjdk.java.net/jdk/pull/6622 From kvn at openjdk.java.net Tue Nov 30 20:49:03 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 30 Nov 2021 20:49:03 GMT Subject: RFR: 8277893: Arraycopy stress tests In-Reply-To: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> References: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> Message-ID: On Tue, 30 Nov 2021 20:23:04 GMT, Aleksey Shipilev wrote: >> test/hotspot/jtreg/TEST.groups line 183: >> >>> 181: >>> 182: tier3_compiler = \ >>> 183: compiler/arraycopy/stress >> >> Can you introduce separate group for this? For example `hotspot_arraycopy_stress` and use it here. >> I am fine with introduced `tier2|3_compiler` groups but it will help us in Oracle to have separate group for `arraycopy` so we can schedule its testing on proper machines. > > Yes, we can. Actually, working on #6622, I realized these test groups would be introduced anyway. So these new arraycopy tests should probably go to `hotspot_slow_compiler` group, along with other `stress` tests. This would hook arraycopy tests into `hotspot:tier3` automatically if #6622 lands. Tell me if you still want a completely separate test group, or `hotspot_slow_compiler` is enough for current Oracle testing infra. Please, create separate test group and add it to `hotspot_slow_compiler`. We would not need to change infra settings if more testing is added to this new group later. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From kvn at openjdk.java.net Tue Nov 30 21:01:08 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 30 Nov 2021 21:01:08 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 20:44:43 GMT, Aleksey Shipilev wrote: >> I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there. >> >> Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features. >> >> We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3. >> >> Sample times for new subgroups (think about this as "How much time they add to existing tiers"): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2_compiler 243 243 0 0 >> ============================== >> >> real 2m16.518s >> user 35m40.839s >> sys 1m35.334s >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3_compiler 132 132 0 0 >> ============================== >> >> real 4m31.935s >> user 71m54.617s >> sys 2m13.073s > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Filter out tier1/2 groups too Looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6622 From shade at openjdk.java.net Tue Nov 30 21:25:07 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 30 Nov 2021 21:25:07 GMT Subject: RFR: 8277893: Arraycopy stress tests In-Reply-To: References: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> Message-ID: <9L5CHY8n-6csbW9jfsnXt4pSqnabXH5R7dt2pZFDmdA=.e53d343e-f7fd-46b1-a8af-02dba3fad3ec@github.com> On Tue, 30 Nov 2021 20:34:46 GMT, Vladimir Kozlov wrote: >> My original intent was to make sure the tests cross all small page sizes (up to 64K) and maybe even some large page sizes (1M `long[]` is 8M, so 2*4M). The size of this array does not matter for test performance very much, since we only allocate two `MAX_SIZE`-d arrays per entire run. Driver even caps the heap size at `-Xmx256m` to block tests from using too much memory. So, I'd leave it at 1M, if you agree. > > Okay. I was concern because of times you show. I am fine with running tests upto 10-15 mins but not this: > > # x86_64 (i5-11500) > real 41m32.622s > user 447m19.986s > sys 0m21.026s > > > Do you know why it takes so much time on it? That small machine has very slow memory compared to other ones. The parallelism in stress tests (9 types, 2 forked VMs each) puts that machine on its knees. There is a blurb about that effect here: https://github.com/openjdk/jdk/pull/6594/files#diff-f72fee20a49daaf4e05002372e93f426407ecd429a227393e2ec79e821042c90R40-R47 -- I don't think it would matter much if we trim `MAX_SIZE`, but I'll try tomorrow. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From jiefu at openjdk.java.net Tue Nov 30 23:37:52 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 30 Nov 2021 23:37:52 GMT Subject: RFR: 8262355: Support for AVX-512 opmask register allocation. [v20] In-Reply-To: References: <3NqvqAfKOiHvDo7gvwLvi5_U_9Rz8DFBijVVf1wpXWk=.90d51fb9-c6d0-45be-89b7-60851c7a6681@github.com> Message-ID: On Fri, 2 Apr 2021 13:16:53 GMT, Jatin Bhateja wrote: >> AVX-512 added 8 new 64 bit opmask registers[1] . These registers allow conditional execution and efficient merging of destination operands. At present cross instruction mask propagation is being done either using a GPR (e.g. vmask_gen patterns in x86.ad) or a vector register (for propagating results of a vector comparison or vector load mask operations). >> >> This base patch extends the register allocator to support allocation of opmask registers. This will facilitate mask propagation across instructions and thus enable emitting efficient instruction sequence over X86 targets supporting AVX-512 feature. >> >> We intend to build a robust optimization framework[2] based on this patch to emit optimized instruction sequence for masked/predicated vector operation for X86 targets supporting AVX-512. >> >> Please review and share your feedback. >> >> Summary of changes: >> >> 1) AD side changes: New register definitions, register classes, allocation classes, operand definitions and spill code handling for opmask registers. >> >> 2) Runtime: Save/restoration for opmask registers in 32 and 64 bit JVM. >> a) For 64 bit JVM we were anyways reserving the space in the frame layout but earlier were not saving and restoring at designated offset(1088), hence no extra space overhead apart from save/restore cost. >> b) For 32 bit JVM: Additional 64 byte are allocated apart from FXSTORE area on the lines of storage for ZMM(16-31) and YMM-Hi bank. There are few regressions due to extra space allocation which we are investigating. >> >> 3) Replacing all the hard-coded opmask references from macro-assembly routines: Pulling out the opmask occurrences all the way up to instruction pattern and adding an unbounded opmask operand for them. This exposes these operands to RA and scheduler; this will automatically facilitate spilling of live opmask registers across call sites. >> >> 4) Register class initializations related to Op_RegVMask during matcher startup. >> >> 5) Handling for mask generating node: Currently VectorMaskGen node uses a GPR to propagate mask across mask generating DEF instruction to its USER instructions. There are other mask generating nodes like VectorCmpMask, VectorLoadMask which are not handled as the part of this patch. Conditional overriding of two routines, ideal_reg and bottom_type for mask generating IDEAL nodes and modifying the instruction patterns to have new opmask operands enables instruction selector to associate opmask register class with USE/DEF operands for such MachNodes. This will constrain the allocation set for these operands to opmask registers(K1-K7). >> >> 6) Creation of a new concrete type TypeVectMask for mask generation nodes and a convivence routine Type::makemask which creates a regular vector types (TypeVect[SDXYZ]) for non-AVX-512 targets and TypeVectMask for a AVX-512 targets. >> >> >> [1] : Section 15.1.3 : https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-software-developers-manual-volume-1-basic-architecture.html >> [2] : http://cr.openjdk.java.net/~jbhateja/avx512_masked_operation_optimization/AVX-512_RA_Opmask_Support_VectorMask_Optimizations.pdf > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - 8262355: Rebasing patch, 32bit clean-up. > - Merge http://github.com/openjdk/jdk into JDK-8262355 > - 8262355: Fix AARCH64 build issue > - 8262355: Review comments resolutions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8262355 > - 8262355: Updating copywriter for edited files. > - 8262355: Adding missed safety check. > - 8262355: Review comments resolution. > - 8262355: Extending Type::isa_vect and Type::is_vect routines to TypeVectMask since its a valid vector type. > - 8262355: Review comments resolution > - ... and 13 more: https://git.openjdk.java.net/jdk/compare/7d0a0bad...b9810d20 Hi all, There are still bugs for opmask register allocation on x86_32. Could someone help to review it https://github.com/openjdk/jdk/pull/6535 ? Thanks. Best regards, Jie ------------- PR: https://git.openjdk.java.net/jdk/pull/2768