From jiefu at openjdk.java.net Thu Oct 1 00:28:24 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 1 Oct 2020 00:28:24 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) In-Reply-To: References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> <-JIanIoqecCK7WfRPElGiS9Gora2wP3NfpGZ3hNL_Hg=.2ccfa0e9-33c6-430b-9303-66829e97e6ff@github.com> Message-ID: On Wed, 30 Sep 2020 18:19:53 GMT, Paul Sandoz wrote: >> Hi @PaulSandoz , >> >> This integration seems to miss https://github.com/openjdk/panama-vector/pull/1, which had fixed crashes on AVX512 >> machines. >> Thanks. > > @DamonFool we can follow up later for that fix (and others in `vectorIntrinsics`), after this PR integrates. I don't > want to perturb the code that has already been reviewed, which requires yet more additional review. Hi @PaulSandoz , I think it would be better to integrate it [1] in this MR. I have tested this MR on our AVX512 machines and it still crashes. Also, for the sake of maintenance, it seems NOT a good idea to push a problematic commit into the jdk main-line repo. As for the review process, I don't think it's a problem since the fix [1] is clear and small enough. What do you think? Thanks. [1] https://github.com/openjdk/panama-vector/commit/1af35c357066743935bd3f48ce3610a41761f89a ------------- PR: https://git.openjdk.java.net/jdk/pull/367 From psandoz at openjdk.java.net Thu Oct 1 01:04:44 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 1 Oct 2020 01:04:44 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) In-Reply-To: References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> <-JIanIoqecCK7WfRPElGiS9Gora2wP3NfpGZ3hNL_Hg=.2ccfa0e9-33c6-430b-9303-66829e97e6ff@github.com> Message-ID: On Thu, 1 Oct 2020 00:25:46 GMT, Jie Fu wrote: >> @DamonFool we can follow up later for that fix (and others in `vectorIntrinsics`), after this PR integrates. I don't >> want to perturb the code that has already been reviewed, which requires yet more additional review. > > Hi @PaulSandoz , > > I think it would be better to integrate it [1] in this MR. > > I have tested this MR on our AVX512 machines and it still crashes. > Also, for the sake of maintenance, it seems NOT a good idea to push a problematic commit into the jdk main-line repo. > > As for the review process, I don't think it's a problem since the fix [1] is clear and small enough. > > What do you think? > > Thanks. > > [1] https://github.com/openjdk/panama-vector/commit/1af35c357066743935bd3f48ce3610a41761f89a @DamonFool I appreciate your efforts on this but i want to hold back on that issue and follow up very quickly after integration of this PR. This change has been through an extremely long and arduous review process, and i want to stick to what was reviewed and not ask reviewers to go through further cycles on what overall is a very large change. Unfortunately this change is in a holding pattern waiting for the CSR to be approved thereby increasing the window where we might find further issues (that if we had already integrated may have been dealt with separately perhaps in a less timely fashion with respect to that integration). Unless an issue is extremely severe I think we should queue them up in `panama-vector/vectorIntrinsics` (there is at least one more for ARM SVE that is queued up). Since the issue you describe effects one instruction, for one type, on AVX512, its impact is limited and will be mitigated by a quick follow up. ------------- PR: https://git.openjdk.java.net/jdk/pull/367 From jiefu at openjdk.java.net Thu Oct 1 01:27:19 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 1 Oct 2020 01:27:19 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) In-Reply-To: References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> <-JIanIoqecCK7WfRPElGiS9Gora2wP3NfpGZ3hNL_Hg=.2ccfa0e9-33c6-430b-9303-66829e97e6ff@github.com> Message-ID: On Thu, 1 Oct 2020 01:01:23 GMT, Paul Sandoz wrote: >> Hi @PaulSandoz , >> >> I think it would be better to integrate it [1] in this MR. >> >> I have tested this MR on our AVX512 machines and it still crashes. >> Also, for the sake of maintenance, it seems NOT a good idea to push a problematic commit into the jdk main-line repo. >> >> As for the review process, I don't think it's a problem since the fix [1] is clear and small enough. >> >> What do you think? >> >> Thanks. >> >> [1] https://github.com/openjdk/panama-vector/commit/1af35c357066743935bd3f48ce3610a41761f89a > > @DamonFool I appreciate your efforts on this but i want to hold back on that issue and follow up very quickly after > integration of this PR. This change has been through an extremely long and arduous review process, and i want to stick > to what was reviewed and not ask reviewers to go through further cycles on what overall is a very large change. > Unfortunately this change is in a holding pattern waiting for the CSR to be approved thereby increasing the window > where we might find further issues (that if we had already integrated may have been dealt with separately perhaps in a > less timely fashion with respect to that integration). Unless an issue is extremely severe I think we should queue them > up in `panama-vector/vectorIntrinsics` (there is at least one more for ARM SVE that is queued up). Since the issue you > describe effects one instruction, for one type, on AVX512, its impact is limited and will be mitigated by a quick > follow up. Okay. I can understand it. Vector API is very valuable to us. Hope the follow-ups can be integrated as soon as possible. And thank you all for your great work. Best regards, Jie ------------- PR: https://git.openjdk.java.net/jdk/pull/367 From xliu at openjdk.java.net Thu Oct 1 01:28:50 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 1 Oct 2020 01:28:50 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis [v2] In-Reply-To: References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <7kE7rOEXkO61vLoj_GsgJVjPUe5DqluRkNyJKDUVi0o=.e55d2fd2-ed90-4be1-8e8f-540a86996d50@github.com> <76F67wutfIYeeKdNWfpWUd0EsZSbDDVJm-3bqTixFRE=.c4e31f33-272e-43d2-8bf2-af510634489a@github.com> Message-ID: <6PJot0t67dbt6FBkNcvbqKefn6SNgpcj_zn6Dtn5VKo=.9363f7a9-5901-43db-ad75-1786639207ce@github.com> On Wed, 30 Sep 2020 21:45:43 GMT, Ludovic Henry wrote: >> src/utils/hsdis/Makefile line 198: >> >>> 196: $(TARGET_DIR)/libiberty/libiberty.a >>> 197: else >>> 198: LIBRARIES/amd64 = LLVMX86Disassembler LLVMX86AsmParser LLVMX86CodeGen LLVMCFGuard LLVMGlobalISel LLVMSelectionDAG \ >> >> To disassemble code, I don't think we have to link so many libraries. It looks like code only explicitly depends >> LLVMMCDisassembler and LLVMTarget here. >> If we do need to link those libraries, how about we just use `llvm-config --libs`. If we declare so many names here, >> the Makefile is subject to LLVM. In history, LLVM refactored a lot. > >>To disassemble code, I don't think we have to link so many libraries. It looks like code only explicitly depends >>LLVMMCDisassembler and LLVMTarget here. > > This is the result of `llvm-config --libs x86 x86disassembler` and `llvm-config --libs aarch64 aarch64disassembler`. > >> If we do need to link those libraries, how about we just use llvm-config --libs. If we declare so many names here, the >> Makefile is subject to LLVM. In history, LLVM refactored a lot. > > That should work in the general case. I did it this way originally because of cases where we need to use a > cross-compiled LLVM. Then `llvm-config` and the related libraries would be compiled for the host platform. Then, the > user can just specify the `LIBRARIES/*` variable by hand on the command line. I saw you manged to solve this cross-compilation issue. cool! Please note that I am not a reviewer. feel free to ignore what I said if it makes nonsense. I just want to chip in some constructive ideas. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From kvn at openjdk.java.net Thu Oct 1 02:05:06 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 1 Oct 2020 02:05:06 GMT Subject: RFR: 8253822: Remove unused exception_address_is_unpack_entry In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 20:14:25 GMT, Nils Eliasson wrote: > I have searched the code base without finding any use. > > Please review, > > Best regards, > Nils Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/410 From luhenry at openjdk.java.net Thu Oct 1 03:32:21 2020 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Thu, 1 Oct 2020 03:32:21 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis [v2] In-Reply-To: <6PJot0t67dbt6FBkNcvbqKefn6SNgpcj_zn6Dtn5VKo=.9363f7a9-5901-43db-ad75-1786639207ce@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <7kE7rOEXkO61vLoj_GsgJVjPUe5DqluRkNyJKDUVi0o=.e55d2fd2-ed90-4be1-8e8f-540a86996d50@github.com> <76F67wutfIYeeKdNWfpWUd0EsZSbDDVJm-3bqTixFRE=.c4e31f33-272e-43d2-8bf2-af510634489a@github.com> <6PJot0t67dbt6FBkNcvbqKefn6SNgpcj_zn6Dtn5VKo=.9363f7a9-5901-43db-ad75-1786639207ce@github.com> Message-ID: On Thu, 1 Oct 2020 01:26:18 GMT, Xin Liu wrote: >>>To disassemble code, I don't think we have to link so many libraries. It looks like code only explicitly depends >>>LLVMMCDisassembler and LLVMTarget here. >> >> This is the result of `llvm-config --libs x86 x86disassembler` and `llvm-config --libs aarch64 aarch64disassembler`. >> >>> If we do need to link those libraries, how about we just use llvm-config --libs. If we declare so many names here, the >>> Makefile is subject to LLVM. In history, LLVM refactored a lot. >> >> That should work in the general case. I did it this way originally because of cases where we need to use a >> cross-compiled LLVM. Then `llvm-config` and the related libraries would be compiled for the host platform. Then, the >> user can just specify the `LIBRARIES/*` variable by hand on the command line. > > I saw you manged to solve this cross-compilation issue. cool! > Please note that I am not a reviewer. feel free to ignore what I said if it makes nonsense. > I just want to chip in some constructive ideas. That isn't going to work in the case where you've compiled LLVM for a different host than the current one (build != host). I'll add a comment to document such a case and what to do then. Thanks for the reminder. > Please note that I am not a reviewer. feel free to ignore what I said if it makes nonsense. > I just want to chip in some constructive ideas. This is more than welcome! Having another pair of eyes is always welcome, especially in the open from someone I don't have the opportunity to work with on a daily basis. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From luhenry at openjdk.java.net Thu Oct 1 04:06:07 2020 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Thu, 1 Oct 2020 04:06:07 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis [v5] In-Reply-To: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> Message-ID: <05TVmcN9T1wK0qnIHnMl_otDbxEH0UGrZKQKuVN6c9w=.5b4de4d6-c68d-418e-8cc9-b927a174b422@github.com> > When bringing up Hotspot onto new platforms, it is not always possible to compile hsdis because gcc is not yet > available. For example, for Windows-AArch64 and macOS-AArch64. > For some such platforms, it is possible to use LLVM as an alternative backend as it also supports a disassembler > feature. Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: Document the case of cross-compiled LLVM ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/392/files - new: https://git.openjdk.java.net/jdk/pull/392/files/c497b30f..2a1c724c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=392&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=392&range=03-04 Stats: 32 lines in 2 files changed: 20 ins; 6 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/392.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/392/head:pull/392 PR: https://git.openjdk.java.net/jdk/pull/392 From enikitin at openjdk.java.net Thu Oct 1 04:57:39 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Thu, 1 Oct 2020 04:57:39 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v6] In-Reply-To: References: Message-ID: > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Use SB to build TestStringIntrinsics message, avoid type guessing in ArrayDiff ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/112/files - new: https://git.openjdk.java.net/jdk/pull/112/files/e6fb6d04..f0b1df89 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=04-05 Stats: 40 lines in 2 files changed: 1 ins; 30 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/112.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/112/head:pull/112 PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Thu Oct 1 04:57:39 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Thu, 1 Oct 2020 04:57:39 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v6] In-Reply-To: <6vRhHgd3e0gnEqpA5d_R7PMM_QK2OQicLzjRM5tQ69k=.c7d9e131-533f-4418-a101-1890ecc9c4c9@github.com> References: <6vRhHgd3e0gnEqpA5d_R7PMM_QK2OQicLzjRM5tQ69k=.c7d9e131-533f-4418-a101-1890ecc9c4c9@github.com> Message-ID: On Wed, 30 Sep 2020 21:19:03 GMT, Igor Ignatyev wrote: > I'd use `StringBuilder` to construct the message. Fixed, please check the f0b1df8. > wouldn't the following patch do? Wow, that worked. That type erasure... pushed in the same commit f0b1df8. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From kvn at openjdk.java.net Thu Oct 1 05:32:41 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 1 Oct 2020 05:32:41 GMT Subject: RFR: 8253636: C2: Adjust NodeClasses::_max_classes In-Reply-To: References: Message-ID: <93qu5x5KKdAHm5cKjiUi8nC2ulgol3MX3Yx1UPoQcLM=.94361fd2-5cc0-4c5e-9b4c-124a563804f7@github.com> On Tue, 29 Sep 2020 09:57:38 GMT, Roberto Casta?eda Lozano wrote: > Update `NodeClasses::_max_classes` to the max class id within the enumeration. Update comment and assertion to reflect > that `NodeClasses` uses now 32 bits after the addition of `Opaque1` in JDK-8229495. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/397 From iignatyev at openjdk.java.net Thu Oct 1 06:21:11 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Thu, 1 Oct 2020 06:21:11 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v6] In-Reply-To: References: Message-ID: <9thYXJXttNl2VIj3C5eGuFwEIwgZZ-TeGbeX67LqQl0=.987baa49-c4c4-466a-b115-ca261bec62b7@github.com> On Thu, 1 Oct 2020 04:57:39 GMT, Evgeny Nikitin wrote: >> pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) >> >> Error reporting was improved by writing a C-style escaped string representations for the variables passed to the >> methods being tested. For array comparisons, a dedicated diff-formatter was implemented. >> Sample output for comparing byte arrays (with artificial failure): >> ----------System.err:(21/1553)---------- >> Result: (false) of 'arrayEqualsB' is not equal to expected (true) >> Arrays differ starting from [index: 7]: >> ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... >> ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... >> ^^^^ >> java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not >> equal to expected (true) >> at >> compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) >> at ... stack trace continues - E.N. >> Sample output for comparing char arrays: >> ----------System.err:(21/1579)*---------- >> Result: (false) of 'arrayEqualsC' is not equal to expected (true) >> Arrays differ starting from [index: 7]: >> ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... >> ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... >> ^^^^^^^ >> java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not >> equal to expected (true) >> at >> compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) >> at >> ... and so on - E.N. >> >> testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Use SB to build TestStringIntrinsics message, avoid type guessing in ArrayDiff LGTM ------------- Marked as reviewed by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/112 From github.com+8792647+robcasloz at openjdk.java.net Thu Oct 1 06:29:54 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 1 Oct 2020 06:29:54 GMT Subject: RFR: 8253636: C2: Adjust NodeClasses::_max_classes In-Reply-To: <93qu5x5KKdAHm5cKjiUi8nC2ulgol3MX3Yx1UPoQcLM=.94361fd2-5cc0-4c5e-9b4c-124a563804f7@github.com> References: <93qu5x5KKdAHm5cKjiUi8nC2ulgol3MX3Yx1UPoQcLM=.94361fd2-5cc0-4c5e-9b4c-124a563804f7@github.com> Message-ID: On Thu, 1 Oct 2020 05:30:14 GMT, Vladimir Kozlov wrote: > Good. Thanks Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/397 From xxinliu at amazon.com Thu Oct 1 06:32:48 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Thu, 1 Oct 2020 06:32:48 +0000 Subject: RFR: 8251464: make Node::dump(int depth) support indent In-Reply-To: References: <1598731717217.87517@amazon.com> <44f14e58-06d8-b1d9-baa8-88edfed6dd78@oracle.com> <1599193987757.63967@amazon.com> <5fa23374-dfe0-d9ea-a6b9-d3cf432bab64@oracle.com> <83692f83-9de3-5b8c-7399-7da272690d6e@redhat.com> <1601104104594.5136@amazon.com>, Message-ID: <1601533967694.79226@amazon.com> Thanks. Andrew. I didn't think of that. Indeed, No matter du-chain or du-chain, there may have more than one path. So far, I only need to know one path. it's because I am debugging why IterGVN and CCP wipe out my nodes. Let me think about it more. I will find a JBS issue about it. @Tobias, Could you take a look at my PR? https://github.com/openjdk/jdk/pull/371 You reviewed it before. I just fixed a bug since then. Here I set indentation number when a node enqueues. This guarantees the output can indent in BFS order. https://github.com/openjdk/jdk/pull/371/commits/2c5376f092f50a6e918f6730a472de9a64cca6ee#diff-e1c5a771a82d5fdb7e88c5b90b444736R1891 thanks, --lx ________________________________________ From: Andrew Dinn Sent: Monday, September 28, 2020 2:07 AM To: Liu, Xin; Tobias Hartmann; 'hotspot-compiler-dev at openjdk.java.net' Subject: RE: [EXTERNAL] RFR: 8251464: make Node::dump(int depth) support indent CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Xin Liu, On 26/09/2020 08:08, Liu, Xin wrote: > This feature is intended to dump a small portion of ideal graph in > the debugger. In that scenario, I think indentation does help my > eyes. Yes, I think it is most useful when dumping subgraphs at depths between 2 and, say, 5. > Thank you to share your experiences. yes, I tried it(sort -n file) > and this tip is incredibly effective! The ordered nodes can help > people to catch what they are looking for quickly. Good. I'm glad sharing my working practice was useful to at least one person :-) > I don't want to break anybody's established workflow, so I reset the > flag PrintIdealIndentThreshold to 0. It means node.dump won't use any > indentation until we set it. > https://openjdk.github.io/cr/?repo=jdk&pr=371&range=00#udiff-0 That's very considerate of you. Thank you. > You guys must have some fancy gdb scripts, emacs lisp plugin or handy > shell scripts etc. The only downside is they are personal arsenal and > it may be not easy to maintain them sometime. On the other side of > spectrum, starters like me need to bootstrap in long way. Sometimes, I'm afraid I don't have a lot more to share. Mostly, I don't write scripts. I usually just write the dump output to a file and then process it with bash, sed and awk code written on the command line. I have written extensive elisp search and formatting functions in the past but not for parsing ideal graphs. > we need to reinvent the wheel. eg. I spent a lot of time to develop a > function to query a node by idx. I finished it but eventually came > across this handy function in node.cpp. that's what I did! > > // Call this from debugger with root node as default: Node* > find_node(const int idx) { return > Compile::current()->root()->find(idx); } Well, now you have taught me something I didn't know in return. Thanks for sharing ;-) > That's why I'd like to put some debug functionalities to c2 codebase. > I think we can collect those handy functions in an individual file. > What do you think? I have another 2 candidates I plan to work on. 1. > dump all node and sorted them by indices 2. dump a path from node a > to node b. We can have a depth-first search along du or ud chains. Option 1 is easily achieved by writing the graph dump to file and passing through sort -n so it's not a great step forward. Option 2 sounds like it would be more useful. Initially I was wondering what you would do when there are multiple paths. then I thought perhaps the command ought to list all paths in some well-defined order? That would make the case where the nodes are not connected uniform with the cases where there is one or more path i.e. print 0 paths, 1 path, 2 paths etc regards, Andrew Dinn ----------- From github.com+8792647+robcasloz at openjdk.java.net Thu Oct 1 06:42:13 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 1 Oct 2020 06:42:13 GMT Subject: Integrated: 8253636: C2: Adjust NodeClasses::_max_classes In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 09:57:38 GMT, Roberto Casta?eda Lozano wrote: > Update `NodeClasses::_max_classes` to the max class id within the enumeration. Update comment and assertion to reflect > that `NodeClasses` uses now 32 bits after the addition of `Opaque1` in JDK-8229495. This pull request has now been integrated. Changeset: 5dd9353b Author: Roberto Casta?eda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/5dd9353b Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod 8253636: C2: Adjust NodeClasses::_max_classes Update NodeClasses::_max_classes to the max class id within the enumeration. Update comment and assertion to reflect that NodeClasses uses now 32 bits after the addition of Opaque1 in JDK-8229495. Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/397 From tobias.hartmann at oracle.com Thu Oct 1 06:51:36 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 1 Oct 2020 08:51:36 +0200 Subject: RFR: 8251464: make Node::dump(int depth) support indent In-Reply-To: <1601533967694.79226@amazon.com> References: <1598731717217.87517@amazon.com> <44f14e58-06d8-b1d9-baa8-88edfed6dd78@oracle.com> <1599193987757.63967@amazon.com> <5fa23374-dfe0-d9ea-a6b9-d3cf432bab64@oracle.com> <83692f83-9de3-5b8c-7399-7da272690d6e@redhat.com> <1601104104594.5136@amazon.com> <1601533967694.79226@amazon.com> Message-ID: <2a66f643-c658-89e7-7dd9-2ca58dfd16e7@oracle.com> Hi Xin, On 01.10.20 08:32, Liu, Xin wrote: > @Tobias, > Could you take a look at my PR? Yes, it's on my list. I'll look at it. Best regards, Tobias From mdoerr at openjdk.java.net Thu Oct 1 09:24:18 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 1 Oct 2020 09:24:18 GMT Subject: Integrated: 8253690: [PPC64] Use flag kind "diagnostic" for platform specific flags In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 20:49:01 GMT, Martin Doerr wrote: > Current platform implementation (globals_ppc.hpp) uses regular product flags for almost everything. > Most platform specific flags were never intended for official support. They are only there to diagnose issues and find > workarounds. So flag kind "diagnostic" fits better for them. > > Note that I rearranged a couple of lines when looking at the diff. > My actual change is what is described here: https://bugs.openjdk.java.net/browse/JDK-8253692 This pull request has now been integrated. Changeset: a8242892 Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/a8242892 Stats: 36 lines in 1 file changed: 11 ins; 10 del; 15 mod 8253690: [PPC64] Use flag kind "diagnostic" for platform specific flags Reviewed-by: stuefe, lucy ------------- PR: https://git.openjdk.java.net/jdk/pull/413 From mdoerr at openjdk.java.net Thu Oct 1 09:27:05 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 1 Oct 2020 09:27:05 GMT Subject: Integrated: 8253689: [s390] Use flag kind "diagnostic" for platform specific flags In-Reply-To: References: Message-ID: On Tue, 29 Sep 2020 20:52:17 GMT, Martin Doerr wrote: > Current platform implementation (globals_s390.hpp) uses regular product flags for everything. > These platform specific flags were never intended for official support. They are only there to diagnose issues and find > workarounds. So flag kind "diagnostic" fits better. > > CSR: https://bugs.openjdk.java.net/browse/JDK-8253691 This pull request has now been integrated. Changeset: 7779ce9f Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/7779ce9f Stats: 18 lines in 1 file changed: 0 ins; 0 del; 18 mod 8253689: [s390] Use flag kind "diagnostic" for platform specific flags Reviewed-by: stuefe, lucy ------------- PR: https://git.openjdk.java.net/jdk/pull/414 From thartmann at openjdk.java.net Thu Oct 1 13:31:37 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 1 Oct 2020 13:31:37 GMT Subject: RFR: 8251464: make Node::dump(int depth) support indent In-Reply-To: References: Message-ID: On Sat, 26 Sep 2020 06:22:49 GMT, Xin Liu wrote: > Node::dump(depth) indents 2 whitespaces for each level. > The function isnot on until the depth of the ideal graph isnot greater > than PrintIdealIndentThreshold (0 by default). This looks good to me but please make sure that there are no other tests depending on the tab character in the output. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/371 From luhenry at openjdk.java.net Thu Oct 1 14:34:22 2020 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Thu, 1 Oct 2020 14:34:22 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis [v6] In-Reply-To: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> Message-ID: > When bringing up Hotspot onto new platforms, it is not always possible to compile hsdis because gcc is not yet > available. For example, for Windows-AArch64 and macOS-AArch64. > For some such platforms, it is possible to use LLVM as an alternative backend as it also supports a disassembler > feature. Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: Match assembly notation with gcc ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/392/files - new: https://git.openjdk.java.net/jdk/pull/392/files/2a1c724c..84c63455 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=392&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=392&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/392.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/392/head:pull/392 PR: https://git.openjdk.java.net/jdk/pull/392 From neliasso at openjdk.java.net Thu Oct 1 15:34:06 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 1 Oct 2020 15:34:06 GMT Subject: Integrated: 8253822: Remove unused exception_address_is_unpack_entry In-Reply-To: References: Message-ID: <6i9nDQkA7bXtCZTf5fNQO1Ae7UofgJq1_tLdpeMNoZg=.a2d1d35a-3b40-449c-8eec-3fdd450edd98@github.com> On Tue, 29 Sep 2020 20:14:25 GMT, Nils Eliasson wrote: > I have searched the code base without finding any use. > > Please review, > > Best regards, > Nils This pull request has now been integrated. Changeset: 96704253 Author: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/96704253 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod 8253822: Remove unused exception_address_is_unpack_entry Removing dead code Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/410 From kvn at openjdk.java.net Thu Oct 1 16:47:05 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 1 Oct 2020 16:47:05 GMT Subject: RFR: 8253118: Avoid unnecessary deopts when OSR nmethods of the same level are present. In-Reply-To: References: Message-ID: On Fri, 25 Sep 2020 16:05:34 GMT, Igor Veresov wrote: > When running with ```-XX:TieredStopAtLevel={2|3}``` the policy tried to switch to OSR method of the same level if those > are present, which caused constant deopting. The fix is to consider only higher levels for OSR switches. Changes requested by kvn (Reviewer). src/hotspot/share/compiler/tieredThresholdPolicy.cpp line 519: > 517: nmethod* osr_nm = inlinee->lookup_osr_nmethod_for(bci, expected_comp_level, false); > 518: assert(osr_nm == NULL || osr_nm->comp_level() >= expected_comp_level, "lookup_osr_nmethod_for is broken"); > 519: if (osr_nm != NULL && osr_nm->comp_level() != comp_level) { Should you check '> comp_level' here? expected_comp_level could be CompLevel_simple and as result you can get our_nm with lower level than current comp_level. ------------- PR: https://git.openjdk.java.net/jdk/pull/360 From kvn at openjdk.java.net Thu Oct 1 16:55:03 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 1 Oct 2020 16:55:03 GMT Subject: RFR: 8253118: Avoid unnecessary deopts when OSR nmethods of the same level are present. In-Reply-To: References: Message-ID: On Fri, 25 Sep 2020 16:05:34 GMT, Igor Veresov wrote: > When running with ```-XX:TieredStopAtLevel={2|3}``` the policy tried to switch to OSR method of the same level if those > are present, which caused constant deopting. The fix is to consider only higher levels for OSR switches. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/360 From iveresov at openjdk.java.net Thu Oct 1 16:55:04 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Thu, 1 Oct 2020 16:55:04 GMT Subject: RFR: 8253118: Avoid unnecessary deopts when OSR nmethods of the same level are present. In-Reply-To: References: Message-ID: On Thu, 1 Oct 2020 16:14:42 GMT, Vladimir Kozlov wrote: >> When running with ```-XX:TieredStopAtLevel={2|3}``` the policy tried to switch to OSR method of the same level if those >> are present, which caused constant deopting. The fix is to consider only higher levels for OSR switches. > > src/hotspot/share/compiler/tieredThresholdPolicy.cpp line 519: > >> 517: nmethod* osr_nm = inlinee->lookup_osr_nmethod_for(bci, expected_comp_level, false); >> 518: assert(osr_nm == NULL || osr_nm->comp_level() >= expected_comp_level, "lookup_osr_nmethod_for is broken"); >> 519: if (osr_nm != NULL && osr_nm->comp_level() != comp_level) { > > Should you check '> comp_level' here? expected_comp_level could be CompLevel_simple and as result you can get our_nm > with lower level than current comp_level. No we shouldn't. That's intentional. Sometimes we have to go to the lower level (3->1 or 2->1) when the method is not compilable at level 4. ------------- PR: https://git.openjdk.java.net/jdk/pull/360 From kvn at openjdk.java.net Thu Oct 1 16:55:04 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 1 Oct 2020 16:55:04 GMT Subject: RFR: 8253118: Avoid unnecessary deopts when OSR nmethods of the same level are present. In-Reply-To: References: Message-ID: On Thu, 1 Oct 2020 16:50:06 GMT, Igor Veresov wrote: >> src/hotspot/share/compiler/tieredThresholdPolicy.cpp line 519: >> >>> 517: nmethod* osr_nm = inlinee->lookup_osr_nmethod_for(bci, expected_comp_level, false); >>> 518: assert(osr_nm == NULL || osr_nm->comp_level() >= expected_comp_level, "lookup_osr_nmethod_for is broken"); >>> 519: if (osr_nm != NULL && osr_nm->comp_level() != comp_level) { >> >> Should you check '> comp_level' here? expected_comp_level could be CompLevel_simple and as result you can get our_nm >> with lower level than current comp_level. > > No we shouldn't. That's intentional. Sometimes we have to go to the lower level (3->1 or 2->1) when the method is not > compilable at level 4. ok ------------- PR: https://git.openjdk.java.net/jdk/pull/360 From xliu at openjdk.java.net Thu Oct 1 19:09:06 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 1 Oct 2020 19:09:06 GMT Subject: RFR: 8251464: make Node::dump(int depth) support indent In-Reply-To: References: Message-ID: <0ocX5EGP5KWloTvLF3agvmhwZlrVA4Mo0QdP8-9kVrg=.f84b4ae5-d859-4cbd-b5b4-1a5a41250d02@github.com> On Thu, 1 Oct 2020 13:26:12 GMT, Tobias Hartmann wrote: >> Node::dump(depth) indents 2 whitespaces for each level. >> The function isnot on until the depth of the ideal graph isnot greater >> than PrintIdealIndentThreshold (0 by default). > > This looks good to me but please make sure that there are no other tests depending on the tab character in the output. Thanks Tobias! I ran tier1 test and it looks good. Let me verify it using submit repo before I integrate it. ------------- PR: https://git.openjdk.java.net/jdk/pull/371 From lmesnik at openjdk.java.net Thu Oct 1 19:54:05 2020 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Thu, 1 Oct 2020 19:54:05 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v6] In-Reply-To: References: Message-ID: On Thu, 1 Oct 2020 04:57:39 GMT, Evgeny Nikitin wrote: >> pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) >> >> Error reporting was improved by writing a C-style escaped string representations for the variables passed to the >> methods being tested. For array comparisons, a dedicated diff-formatter was implemented. >> Sample output for comparing byte arrays (with artificial failure): >> ----------System.err:(21/1553)---------- >> Result: (false) of 'arrayEqualsB' is not equal to expected (true) >> Arrays differ starting from [index: 7]: >> ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... >> ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... >> ^^^^ >> java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not >> equal to expected (true) >> at >> compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) >> at ... stack trace continues - E.N. >> Sample output for comparing char arrays: >> ----------System.err:(21/1579)*---------- >> Result: (false) of 'arrayEqualsC' is not equal to expected (true) >> Arrays differ starting from [index: 7]: >> ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... >> ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... >> ^^^^^^^ >> java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not >> equal to expected (true) >> at >> compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) >> at >> ... and so on - E.N. >> >> testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Use SB to build TestStringIntrinsics message, avoid type guessing in ArrayDiff Changes requested by lmesnik (Committer). test/lib/jdk/test/lib/format/Format.java line 2: > 1: /* > 2: * Copyright (c) 2015, 2020, Oracle and/or its affiliates. All rights reserved. Shouldn't be just 2020? test/lib/jdk/test/lib/format/Diff.java line 30: > 28: > 29: /** > 30: * An abstraction representing formattabe difference between two or more objects typo in 'formattabe' test/lib/jdk/test/lib/format/ArrayDiff.java line 81: > 79: * Default limits for the formatter > 80: */ > 81: public static class Defaults { Does it make sense to move them into Diff? Seems like very generic formatter properties to me. test/lib-test/jdk/test/lib/format/ArrayDiffTest.java line 38: > 36: * @run testng jdk.test.lib.format.ArrayDiffTest > 37: */ > 38: public class ArrayDiffTest { Might be it makes sense to check null values. Also, reversed checks. For example there is first = [1, ..], second = []. But no check for first = [], second = [1, ..] for some cases, ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From xliu at openjdk.java.net Thu Oct 1 23:31:21 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 1 Oct 2020 23:31:21 GMT Subject: RFR: 8251464: make Node::dump(int depth) support indent [v2] In-Reply-To: References: Message-ID: > Node::dump(depth) indents 2 whitespaces for each level. > The function isnot on until the depth of the ideal graph isnot greater > than PrintIdealIndentThreshold (0 by default). Xin Liu has updated the pull request incrementally with one additional commit since the last revision: add self-verification workflow. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/371/files - new: https://git.openjdk.java.net/jdk/pull/371/files/2c5376f0..af23f175 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=371&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=371&range=00-01 Stats: 861 lines in 1 file changed: 861 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/371.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/371/head:pull/371 PR: https://git.openjdk.java.net/jdk/pull/371 From xliu at openjdk.java.net Thu Oct 1 23:40:08 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 1 Oct 2020 23:40:08 GMT Subject: RFR: 8251464: make Node::dump(int depth) support indent [v3] In-Reply-To: References: Message-ID: > Node::dump(depth) indents 2 whitespaces for each level. > The function isnot on until the depth of the ideal graph isnot greater > than PrintIdealIndentThreshold (0 by default). Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8251464: make Node::dump(int depth) support indent ------------- Changes: https://git.openjdk.java.net/jdk/pull/371/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=371&range=02 Stats: 49 lines in 5 files changed: 26 ins; 0 del; 23 mod Patch: https://git.openjdk.java.net/jdk/pull/371.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/371/head:pull/371 PR: https://git.openjdk.java.net/jdk/pull/371 From xliu at openjdk.java.net Fri Oct 2 02:02:05 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 2 Oct 2020 02:02:05 GMT Subject: RFR: 8251464: make Node::dump(int depth) support indent [v3] In-Reply-To: <0ocX5EGP5KWloTvLF3agvmhwZlrVA4Mo0QdP8-9kVrg=.f84b4ae5-d859-4cbd-b5b4-1a5a41250d02@github.com> References: <0ocX5EGP5KWloTvLF3agvmhwZlrVA4Mo0QdP8-9kVrg=.f84b4ae5-d859-4cbd-b5b4-1a5a41250d02@github.com> Message-ID: <9TDlIPSotJUFUQwlPBr8DOhIZydJGzEo6Nuh38kOGzI=.82cfa7f7-b617-4632-89ef-c9cffed09521@github.com> On Thu, 1 Oct 2020 19:06:16 GMT, Xin Liu wrote: >> This looks good to me but please make sure that there are no other tests depending on the tab character in the output. > > Thanks Tobias! > I ran tier1 test and it looks good. Let me verify it using submit repo before I integrate it. pass all tests: https://github.com/navyxliu/jdk/actions/runs/283309901 ------------- PR: https://git.openjdk.java.net/jdk/pull/371 From iveresov at openjdk.java.net Fri Oct 2 02:26:06 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 2 Oct 2020 02:26:06 GMT Subject: Integrated: 8253118: Avoid unnecessary deopts when OSR nmethods of the same level are present. In-Reply-To: References: Message-ID: On Fri, 25 Sep 2020 16:05:34 GMT, Igor Veresov wrote: > When running with ```-XX:TieredStopAtLevel={2|3}``` the policy tried to switch to OSR method of the same level if those > are present, which caused constant deopting. The fix is to consider only higher levels for OSR switches. This pull request has now been integrated. Changeset: b9505df3 Author: Igor Veresov URL: https://git.openjdk.java.net/jdk/commit/b9505df3 Stats: 9 lines in 1 file changed: 3 ins; 0 del; 6 mod 8253118: Avoid unnecessary deopts when OSR nmethods of the same level are present. Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/360 From yzheng at openjdk.java.net Fri Oct 2 04:48:18 2020 From: yzheng at openjdk.java.net (Yudi Zheng) Date: Fri, 2 Oct 2020 04:48:18 GMT Subject: RFR: 8253842: [JVMCI] Allow implicit exception to dispatch to other address in jvmci compilers. Message-ID: This change allows JVMCI compilers to append an entry with flexible dispatch offset to the implicit exception table. ------------- Commit messages: - Allow implicit exception to dispatch to other address in jvmci compilers. Changes: https://git.openjdk.java.net/jdk/pull/472/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=472&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253842 Stats: 91 lines in 4 files changed: 89 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/472/head:pull/472 PR: https://git.openjdk.java.net/jdk/pull/472 From dnsimon at openjdk.java.net Fri Oct 2 08:09:40 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 2 Oct 2020 08:09:40 GMT Subject: RFR: 8253842: [JVMCI] Allow implicit exception to dispatch to other address in jvmci compilers. In-Reply-To: References: Message-ID: On Thu, 1 Oct 2020 18:35:04 GMT, Yudi Zheng wrote: > This change allows JVMCI compilers to append an entry with flexible dispatch offset to the implicit exception table. src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.code/src/jdk/vm/ci/code/site/ImplicitExceptionDispatch.java line 31: > 29: /** > 30: * Represents an implicit exception dispatch in the code. Implicit exception is a platform-specific > 31: * optimization that makes use of operating system's trap mechanism, to turn specific branches into Implicit exception _dispatch_ is a ... ... _an_ operating system ------------- PR: https://git.openjdk.java.net/jdk/pull/472 From github.com+70893615+jasontatton-aws at openjdk.java.net Fri Oct 2 08:40:58 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Fri, 2 Oct 2020 08:40:58 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v4] In-Reply-To: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: > This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for > x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it > I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as > code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and > java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 > > Details of testing: > ============ > I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note > that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the > input String. Hence the test has been designed to cover all these cases. In summary they are: > - A ?short? string of < 16 characters. > - A SIMD String of 16 ? 31 characters. > - A AVX2 SIMD String of 32 characters+. > > Hardware used for testing: > ----------------------------- > > - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). > - AWS Graviton 2 (ARM 64 processor). > > I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. > > Possible future enhancements: > ==================== > For the x86 implementation there may be two further improvements we can make in order to improve performance of both > the StringUTF16 and StringLatin1 indexOf(char) intrinsics: > 1. Make use of AVX-512 instructions. > 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD > instructions instead of a loop. > Benchmark results: > ============ > **Without** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ? 182.581 | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ? 435.933 | ns/op | > > > **With** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ? 407.716 | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ? 167.306 | ns/op | > > > The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 > indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when > running on ARM. Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: 8173585: Intrinsify StringLatin1.indexOf(char) Rewrite of unit test and newlines added to end of files Changes to unit test: - main test adjusted such that Strings gennerated are much longer (up to 2048 characters) and of the form: azaza, aazaazaa, aaazaaazaaa, etc with 'z' being the search character searched for. Multiple instances of the search character are included in the String in order to validate that the starting offset is correctly handleded. Results are compared to non intrinsified version of the code. Longer strings means that the looping functionality of the various paths is entered into. - Run configurations introduced such that it checks behaviour where use of SSE and AVX instructions are restricted. - Tier4InvocationThreshold adjusted so as to ensure C2 code iis invoked. Other changes: - newlines added at end of files ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/71/files - new: https://git.openjdk.java.net/jdk/pull/71/files/c8cc441e..c8a2849e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=71&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=71&range=02-03 Stats: 60 lines in 3 files changed: 26 ins; 6 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/71.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/71/head:pull/71 PR: https://git.openjdk.java.net/jdk/pull/71 From github.com+70893615+jasontatton-aws at openjdk.java.net Fri Oct 2 08:41:01 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Fri, 2 Oct 2020 08:41:01 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2] In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <-DRWR4_f5u6DsSGHAuPnpHrhaG8Una8BXf4zDekQjLM=.469b08b6-a8b2-4a6b-8ab8-1a40810aede0@github.com> Message-ID: On Mon, 21 Sep 2020 10:11:28 GMT, Volker Simonis wrote: >> Jason Tatton has updated the pull request with a new target base due to a merge or a rebase. The pull request now >> contains four commits: >> - Merge master >> - 8173585: further whitespace changes required by jcheck >> - JDK-8173585 - whitespace changes required by jcheck >> - JDK-8173585 > > test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java line 24: > >> 22: >> 23: public static void main(String[] args) throws Exception { >> 24: for (int i = 0; i < 100_0; ++i) {//repeat such that we enter into C2 code... > > The placement of the underscore looks strange to me. I'd expect it to separate thousands (like 1_000) if at all but not > sure if id use it for one thousand at all as that's really not such a big number that it is hard to read.. > Also, the Tier4InvocationThreshold is 5000 so I'm not sure youre reaching C2? I have added Tier4InvocationThreshold=200 to the unit test config in order to trigger generation earlier ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From enikitin at openjdk.java.net Fri Oct 2 08:43:05 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 2 Oct 2020 08:43:05 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v7] In-Reply-To: References: Message-ID: > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Fix typos and comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/112/files - new: https://git.openjdk.java.net/jdk/pull/112/files/f0b1df89..d792e6d0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=05-06 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/112.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/112/head:pull/112 PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Fri Oct 2 08:43:07 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 2 Oct 2020 08:43:07 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v6] In-Reply-To: References: Message-ID: <2eymP8HCQa_-BOJA3Iy-rIiHxtLhQCOzrWDVJSJUrn0=.c7da1aac-e1c6-4e88-9658-b1c9deeaf2db@github.com> On Thu, 1 Oct 2020 19:39:25 GMT, Leonid Mesnik wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Use SB to build TestStringIntrinsics message, avoid type guessing in ArrayDiff > > test/lib/jdk/test/lib/format/Format.java line 2: > >> 1: /* >> 2: * Copyright (c) 2015, 2020, Oracle and/or its affiliates. All rights reserved. > > Shouldn't be just 2020? Fixed in the d792e6d0ab9, thanks. > test/lib/jdk/test/lib/format/Diff.java line 30: > >> 28: >> 29: /** >> 30: * An abstraction representing formattabe difference between two or more objects > > typo in 'formattabe' Also fixed in the d792e6d0ab9, thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From github.com+70893615+jasontatton-aws at openjdk.java.net Fri Oct 2 08:47:47 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Fri, 2 Oct 2020 08:47:47 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2] In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <-DRWR4_f5u6DsSGHAuPnpHrhaG8Una8BXf4zDekQjLM=.469b08b6-a8b2-4a6b-8ab8-1a40810aede0@github.com> Message-ID: On Tue, 22 Sep 2020 15:19:37 GMT, Volker Simonis wrote: >> Jason Tatton has updated the pull request with a new target base due to a merge or a rebase. The pull request now >> contains four commits: >> - Merge master >> - 8173585: further whitespace changes required by jcheck >> - JDK-8173585 - whitespace changes required by jcheck >> - JDK-8173585 > > Hi Jason, > > thanks for bringing String.indexOf() for latin strings up to date with the Unicode version. > > Your changes look good except a few minor issues I've commented on right in the code. > > I'd only like to ask you if you could possibly improve your test a little bit. As far as I understand, your search text > is a consecutive sequence of "abc" characters, so you'll always find the character your searching for within the next > three characters of the source text. This won't exercise the loops of your intrinsic. Maybe you can also add some test > versions where the search character will be found beyond the first 32/64 characters after "fromIndex"? @simonis Thank you for the corrections, I have ammended them in the latest comit as follows: Changes to unit test: - main test adjusted such that Strings gennerated are much longer (up to 2048 characters) and of the form: `azaza`, `aazaazaa`, `aaazaaazaaa`, etc with `'z'` being the search character searched for. Multiple instances of the search character are included in the String in order to validate that the starting offset is correctly handleded. Results are compared to non intrinsified version of the code. Longer strings means that the looping functionality of the various paths is entered into. - Run configurations introduced such that it checks behaviour where use of SSE and AVX instructions are restricted. - Tier4InvocationThreshold adjusted so as to ensure C2 code iis invoked. Other changes: - newlines added at end of files @vnkozlov here are the performance numbers as requested. I have included performance of the UTF16 version of the intrinsic for reference: | UseAVX= | UseSSE= | Benchmark | Mode | Cnt | Score | Error | Units | |---------|---------|-----------------------------------|------|-----|-------------|-------------|-------| | | 0 | IndexOfBenchmark.latin1_long_char | avgt | 5 | **447,493.398** | ? 4,666.386 | ns/op | | 0 | | IndexOfBenchmark.latin1_long_char | avgt | 5 | **104,735.941** | ? 2,484.403 | ns/op | | 1 | | IndexOfBenchmark.latin1_long_char | avgt | 5 | **104,342.844** | ? 2,656.343 | ns/op | | 2 | | IndexOfBenchmark.latin1_long_char | avgt | 5 | **61,000.418** | ? 1,543.951 | ns/op | | 3 | | IndexOfBenchmark.latin1_long_char | avgt | 5 | **60,607.988** | ? 1,466.354 | ns/op | | | 0 | IndexOfBenchmark.utf16_long_char | avgt | 5 | 672,475.302 | ? 4,998.596 | ns/op | | 0 | | IndexOfBenchmark.utf16_long_char | avgt | 5 | 175,521.654 | ? 7,549.094 | ns/op | | 1 | | IndexOfBenchmark.utf16_long_char | avgt | 5 | 172,514.981 | ? 3,561.040 | ns/op | | 2 | | IndexOfBenchmark.utf16_long_char | avgt | 5 | 120,725.748 | ? 2,004.400 | ns/op | | 3 | | IndexOfBenchmark.utf16_long_char | avgt | 5 | 120,664.623 | ? 1,988.419 | ns/op | I think the results are as expected, we see improvements in performance as the range of SSE and AVX instructions which can be used is expanded upon. Note that no improvement is observed with UseAVX=3 because there is no AVX-512 code in these intrinsics. ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From enikitin at openjdk.java.net Fri Oct 2 08:50:06 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 2 Oct 2020 08:50:06 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v8] In-Reply-To: References: Message-ID: > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Make ArrayDiff.Defaults a generic Diff.Defaults ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/112/files - new: https://git.openjdk.java.net/jdk/pull/112/files/d792e6d0..81f7a9d9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=06-07 Stats: 17 lines in 2 files changed: 8 ins; 8 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/112.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/112/head:pull/112 PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Fri Oct 2 08:50:07 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 2 Oct 2020 08:50:07 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v6] In-Reply-To: References: Message-ID: <7ClZi2nnkEYbwdpe8DfphsQS_wFssq_vfKoFi80U5QE=.8f724488-5788-4960-877a-ecb739ae20b6@github.com> On Thu, 1 Oct 2020 19:43:09 GMT, Leonid Mesnik wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Use SB to build TestStringIntrinsics message, avoid type guessing in ArrayDiff > > test/lib/jdk/test/lib/format/ArrayDiff.java line 81: > >> 79: * Default limits for the formatter >> 80: */ >> 81: public static class Defaults { > > Does it make sense to move them into Diff? Seems like very generic formatter properties to me. I can imagine different defaults for different diff formatters, but they probably would be very similar. Let's try your suggestion, the 81f7a9d9b88. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From ihse at openjdk.java.net Fri Oct 2 11:47:42 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Fri, 2 Oct 2020 11:47:42 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> Message-ID: On Wed, 30 Sep 2020 00:55:23 GMT, Ludovic Henry wrote: >> When bringing up Hotspot onto new platforms, it is not always possible to compile hsdis because gcc is not yet >> available. For example, for Windows-AArch64 and macOS-AArch64. >> For some such platforms, it is possible to use LLVM as an alternative backend as it also supports a disassembler >> feature. > > @navyxliu I've merged the sources into `src/utils/hsdis` and added support to build it in the Makefile. This is an interesting suggestion. There is a similar attempt at replacing binutils with capstone in https://bugs.openjdk.java.net/browse/JDK-8188073, which unfortunately has not seen much progress due to lack of resources; I don't know if you are aware of that? There is also a (extremely low priority) effort to rewrite the hsdis makefile to be part of the normal build system, see e.g. https://bugs.openjdk.java.net/browse/JDK-8208495. Neither of these should be any blocker for your change, but I think it might be good if you know about them. I have couple of concerns with your patch. One is the method in which LLVM is selected instead of binutils; afaict this depends on having the `LLVM` variable set when executing the makefile. At the very least, this should be documented in the README. I don't think any more complicated configuration is really necessary at this point. With full integration with the build system, a more user-friendly way of selecting hsdis backend should be implemented, though. Second, and I don't know if this is an artifact of git/github/the new skara tooling, but if you renamed hsdis.c to hsdis.cpp, this relationship does not show up, not even in the generated webrevs. Instead they are considered a new + a deleted file. This makes it hard to see what code changes you have done in that file. And third; have you tested that your changes (both changing the main file from C to C++, and any code changes in it) does not break the old binutils functionality? Afaic there are no test suites for exercising hsdis :-( so manual ad-hoc testing is likely needed. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From xliu at openjdk.java.net Fri Oct 2 13:48:41 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 2 Oct 2020 13:48:41 GMT Subject: Integrated: 8251464: make Node::dump(int depth) support indent In-Reply-To: References: Message-ID: On Sat, 26 Sep 2020 06:22:49 GMT, Xin Liu wrote: > Node::dump(depth) indents 2 whitespaces for each level. > The function isnot on until the depth of the ideal graph isnot greater > than PrintIdealIndentThreshold (0 by default). This pull request has now been integrated. Changeset: ea5a2b15 Author: Xin Liu Committer: Paul Hohensee URL: https://git.openjdk.java.net/jdk/commit/ea5a2b15 Stats: 49 lines in 5 files changed: 26 ins; 0 del; 23 mod 8251464: make Node::dump(int depth) support indent Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/371 From kvn at openjdk.java.net Fri Oct 2 16:00:42 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 2 Oct 2020 16:00:42 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) In-Reply-To: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> Message-ID: On Fri, 25 Sep 2020 20:14:29 GMT, Paul Sandoz wrote: > This pull request is for integration of the Vector API. It was previously reviewed under conditions when mercurial was > used for the source code control system. Review threads can be found here (searching for issue number 8223347 in the > title): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-April/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/thread.html > > If mercurial was still being used the code would be pushed directly, once the CSR is approved. However, in this case a > pull request is required and needs explicit reviewer approval. Between the final review and this pull request no code > has changed, except for that related to merging. Good ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/367 From chegar at openjdk.java.net Fri Oct 2 16:00:41 2020 From: chegar at openjdk.java.net (Chris Hegarty) Date: Fri, 2 Oct 2020 16:00:41 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) In-Reply-To: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> Message-ID: On Fri, 25 Sep 2020 20:14:29 GMT, Paul Sandoz wrote: > This pull request is for integration of the Vector API. It was previously reviewed under conditions when mercurial was > used for the source code control system. Review threads can be found here (searching for issue number 8223347 in the > title): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-April/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/thread.html > > If mercurial was still being used the code would be pushed directly, once the CSR is approved. However, in this case a > pull request is required and needs explicit reviewer approval. Between the final review and this pull request no code > has changed, except for that related to merging. Marked as reviewed by chegar (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/367 From lmesnik at openjdk.java.net Fri Oct 2 18:01:40 2020 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Fri, 2 Oct 2020 18:01:40 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v8] In-Reply-To: References: Message-ID: <9mGAphCyQ34v3jOr4qiR4jkUccvdv1noUw5VrjwyBoc=.a282b928-a862-43c9-a206-41b0c6da3c7b@github.com> On Fri, 2 Oct 2020 08:50:06 GMT, Evgeny Nikitin wrote: >> pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) >> >> Error reporting was improved by writing a C-style escaped string representations for the variables passed to the >> methods being tested. For array comparisons, a dedicated diff-formatter was implemented. >> Sample output for comparing byte arrays (with artificial failure): >> ----------System.err:(21/1553)---------- >> Result: (false) of 'arrayEqualsB' is not equal to expected (true) >> Arrays differ starting from [index: 7]: >> ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... >> ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... >> ^^^^ >> java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not >> equal to expected (true) >> at >> compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) >> at ... stack trace continues - E.N. >> Sample output for comparing char arrays: >> ----------System.err:(21/1579)*---------- >> Result: (false) of 'arrayEqualsC' is not equal to expected (true) >> Arrays differ starting from [index: 7]: >> ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... >> ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... >> ^^^^^^^ >> java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not >> equal to expected (true) >> at >> compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) >> at >> ... and so on - E.N. >> >> testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Make ArrayDiff.Defaults a generic Diff.Defaults Marked as reviewed by lmesnik (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From yzheng at openjdk.java.net Fri Oct 2 22:25:51 2020 From: yzheng at openjdk.java.net (Yudi Zheng) Date: Fri, 2 Oct 2020 22:25:51 GMT Subject: RFR: 8253842: [JVMCI] Allow implicit exception to dispatch to other address in jvmci compilers. [v2] In-Reply-To: References: Message-ID: > This change allows JVMCI compilers to append an entry with flexible dispatch offset to the implicit exception table. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: Address comments. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/472/files - new: https://git.openjdk.java.net/jdk/pull/472/files/ea3c39e3..fe63251d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=472&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=472&range=00-01 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/472/head:pull/472 PR: https://git.openjdk.java.net/jdk/pull/472 From kvn at openjdk.java.net Sat Oct 3 00:38:38 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 3 Oct 2020 00:38:38 GMT Subject: RFR: 8253842: [JVMCI] Allow implicit exception to dispatch to other address in jvmci compilers. [v2] In-Reply-To: References: Message-ID: On Fri, 2 Oct 2020 22:25:51 GMT, Yudi Zheng wrote: >> This change allows JVMCI compilers to append an entry with flexible dispatch offset to the implicit exception table. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Address comments. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/472 From enikitin at openjdk.java.net Sat Oct 3 19:15:00 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Sat, 3 Oct 2020 19:15:00 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v9] In-Reply-To: References: Message-ID: <8MKmW-2_q3RYbIsye58L3XfOmaLItKlb8M8WB2-1cmI=.4b5a0dc0-e379-47bf-8479-10f18f0cbf0e@github.com> > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Add null values support and two-way testing ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/112/files - new: https://git.openjdk.java.net/jdk/pull/112/files/81f7a9d9..39992fcb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=07-08 Stats: 323 lines in 3 files changed: 137 ins; 3 del; 183 mod Patch: https://git.openjdk.java.net/jdk/pull/112.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/112/head:pull/112 PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Sat Oct 3 19:20:38 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Sat, 3 Oct 2020 19:20:38 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v6] In-Reply-To: References: Message-ID: On Thu, 1 Oct 2020 19:51:00 GMT, Leonid Mesnik wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Use SB to build TestStringIntrinsics message, avoid type guessing in ArrayDiff > > test/lib-test/jdk/test/lib/format/ArrayDiffTest.java line 38: > >> 36: * @run testng jdk.test.lib.format.ArrayDiffTest >> 37: */ >> 38: public class ArrayDiffTest { > > Might be it makes sense to check null values. Also, reversed checks. For example there is first = [1, ..], second = []. > But no check for first = [], second = [1, ..] for some cases, Fixed in the 39992fcba97. That ended in finding a minor issue, actually ([ArrayDiff.java:180](https://github.com/openjdk/jdk/pull/112/commits/39992fcba979a2f6bf7ea73aff92461b13220ee2#diff-374ead0149950fce10be8d9d37cca320L180)). Sometimes reversing can be valuable, thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From roland at openjdk.java.net Mon Oct 5 06:54:45 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 5 Oct 2020 06:54:45 GMT Subject: RFR: 8253923: C2 doesn't always run loop opts for compilations that include loops Message-ID: 8253923: C2 doesn't always run loop opts for compilations that include loops ------------- Commit messages: - fix detection of compilations with loops Changes: https://git.openjdk.java.net/jdk/pull/478/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=478&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253923 Stats: 81 lines in 9 files changed: 75 ins; 2 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/478.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/478/head:pull/478 PR: https://git.openjdk.java.net/jdk/pull/478 From neliasso at openjdk.java.net Mon Oct 5 06:54:45 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 5 Oct 2020 06:54:45 GMT Subject: RFR: 8253923: C2 doesn't always run loop opts for compilations that include loops In-Reply-To: References: Message-ID: On Fri, 2 Oct 2020 07:47:00 GMT, Roland Westrelin wrote: > 8253923: C2 doesn't always run loop opts for compilations that include loops Thanks for fixing! Extra credit for the verification that will catch this problem type in the future. Reviewed. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/478 From roland at openjdk.java.net Mon Oct 5 06:54:45 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 5 Oct 2020 06:54:45 GMT Subject: RFR: 8253923: C2 doesn't always run loop opts for compilations that include loops In-Reply-To: References: Message-ID: On Fri, 2 Oct 2020 07:47:00 GMT, Roland Westrelin wrote: > 8253923: C2 doesn't always run loop opts for compilations that include loops I noticed that c2 wouldn't always run loop opts when the compilation include loops. Compile::has_loops() is what controls whether loop opts are executed. It's initially false and then set to true as parsing finds new loops. The flag is updated with the result ciMethod::has_loops() of inlined methods but: 1. with bimorphic inlining, the has_loops flag is not set based on the methods that are actually inlined but from the virtual method the call resolve statically to. 2. there's a least one intrinsic that's predicated and falls back to inlining of the method bytecodes, in which case has_loops is not set. 3. sometimes loops are built manually rather than by parsing bytecodes 4. when the backedge is a branch of a switch or an exception edge, ciMethod::has_loops() doesn't return true I propose that 1 & 2 be fixed by setting the has_loops flag when parsing of a method starts rather than when a call is inlined. 3 is fixed by explicitly setting the flag where the loop is built, 4 by extending ciMethod::has_loops() for switches. ------------- PR: https://git.openjdk.java.net/jdk/pull/478 From roland at openjdk.java.net Mon Oct 5 06:54:45 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 5 Oct 2020 06:54:45 GMT Subject: RFR: 8253923: C2 doesn't always run loop opts for compilations that include loops In-Reply-To: References: Message-ID: On Sun, 4 Oct 2020 20:22:26 GMT, Nils Eliasson wrote: >> 8253923: C2 doesn't always run loop opts for compilations that include loops > > Thanks for fixing! > > Extra credit for the verification that will catch this problem type in the future. > > Reviewed. Hi Nils, > Reviewed. Thanks for reviewing. I actually made this a draft PR because I realized when I created it that there was a simple fix (which is to set the has_loops flag when parsing of a method starts). Can you take a look at the new change? ------------- PR: https://git.openjdk.java.net/jdk/pull/478 From thartmann at openjdk.java.net Mon Oct 5 07:03:39 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 5 Oct 2020 07:03:39 GMT Subject: RFR: 8253566: clazz.isAssignableFrom will return false for interface implementors In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 06:39:02 GMT, Roland Westrelin wrote: > The code pattern in the test case is optimized as a trichotomy which > is wrong given SubTypeCheckNode is a special kind of CmpNode that's > not commutative. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/422 From roland at openjdk.java.net Mon Oct 5 07:03:39 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 5 Oct 2020 07:03:39 GMT Subject: RFR: 8253566: clazz.isAssignableFrom will return false for interface implementors In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 18:03:22 GMT, Vladimir Kozlov wrote: >> The code pattern in the test case is optimized as a trichotomy which >> is wrong given SubTypeCheckNode is a special kind of CmpNode that's >> not commutative. > > Good. @vnkozlov @TobiHartmann thanks for the reviews ------------- PR: https://git.openjdk.java.net/jdk/pull/422 From roland at openjdk.java.net Mon Oct 5 07:15:48 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 5 Oct 2020 07:15:48 GMT Subject: RFR: 8253566: clazz.isAssignableFrom will return false for interface implementors [v2] In-Reply-To: References: Message-ID: > The code pattern in the test case is optimized as a trichotomy which > is wrong given SubTypeCheckNode is a special kind of CmpNode that's > not commutative. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - comment - test - trichotomy opt should not be applied to subtype check ------------- Changes: https://git.openjdk.java.net/jdk/pull/422/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=422&range=01 Stats: 71 lines in 2 files changed: 70 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/422.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/422/head:pull/422 PR: https://git.openjdk.java.net/jdk/pull/422 From neliasso at openjdk.java.net Mon Oct 5 07:33:39 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 5 Oct 2020 07:33:39 GMT Subject: RFR: 8253566: clazz.isAssignableFrom will return false for interface implementors [v2] In-Reply-To: References: Message-ID: On Mon, 5 Oct 2020 07:15:48 GMT, Roland Westrelin wrote: >> The code pattern in the test case is optimized as a trichotomy which >> is wrong given SubTypeCheckNode is a special kind of CmpNode that's >> not commutative. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now > contains three commits: > - comment > - test > - trichotomy opt should not be applied to subtype check src/hotspot/share/opto/cfgnode.cpp line 818: > 816: if (!cmp1->is_Cmp() || !cmp2->is_Cmp()) { > 817: return false; // No comparison > 818: } else if (cmp1->Opcode() == Op_CmpF || cmp1->Opcode() == Op_CmpD || This table is getting rather large. Time to add an abstraction on cmp-nodes? ------------- PR: https://git.openjdk.java.net/jdk/pull/422 From roland at openjdk.java.net Mon Oct 5 12:18:42 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 5 Oct 2020 12:18:42 GMT Subject: RFR: 8253566: clazz.isAssignableFrom will return false for interface implementors [v2] In-Reply-To: References: Message-ID: On Mon, 5 Oct 2020 07:30:47 GMT, Nils Eliasson wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now >> contains three commits: >> - comment >> - test >> - trichotomy opt should not be applied to subtype check > > src/hotspot/share/opto/cfgnode.cpp line 818: > >> 816: if (!cmp1->is_Cmp() || !cmp2->is_Cmp()) { >> 817: return false; // No comparison >> 818: } else if (cmp1->Opcode() == Op_CmpF || cmp1->Opcode() == Op_CmpD || > > This table is getting rather large. Time to add an abstraction on cmp-nodes? Thanks for reviewing this. The rationale for the existing ones is described here: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-October/030928.html The reasons for excluding floating point number, pointer comparison and SubTypeCheck don't seem to all fall under a single common justification that could be abstracted away, ------------- PR: https://git.openjdk.java.net/jdk/pull/422 From yzheng at openjdk.java.net Mon Oct 5 12:25:43 2020 From: yzheng at openjdk.java.net (Yudi Zheng) Date: Mon, 5 Oct 2020 12:25:43 GMT Subject: Integrated: 8253842: [JVMCI] Allow implicit exception to dispatch to other address in jvmci compilers. In-Reply-To: References: Message-ID: <0xfHEGWYEWiyYjW8P7IU_Fv7NMf3k__P2JiaI_e4l18=.96e75e2a-bb4c-42d3-9af3-367e22764088@github.com> On Thu, 1 Oct 2020 18:35:04 GMT, Yudi Zheng wrote: > This change allows JVMCI compilers to append an entry with flexible dispatch offset to the implicit exception table. This pull request has now been integrated. Changeset: 5d4a1350 Author: Yudi Zheng Committer: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/5d4a1350 Stats: 91 lines in 4 files changed: 89 ins; 0 del; 2 mod 8253842: [JVMCI] Allow implicit exception to dispatch to other address in jvmci compilers. Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/472 From bulasevich at openjdk.java.net Mon Oct 5 17:40:53 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Mon, 5 Oct 2020 17:40:53 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two Message-ID: Let me revive the change request to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 & 0xFF) | ((v2 & 0xFF) << 8)". Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI instruction is emitted for the new node. [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html ------------- Commit messages: - 8249893: AARCH64: optimize the construction of the value from the bits of the other two Changes: https://git.openjdk.java.net/jdk/pull/511/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=511&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8249893 Stats: 329 lines in 9 files changed: 329 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/511/head:pull/511 PR: https://git.openjdk.java.net/jdk/pull/511 From dnsimon at openjdk.java.net Tue Oct 6 10:15:47 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 6 Oct 2020 10:15:47 GMT Subject: RFR: 8253874: [JVMCI] added test omitted in 8252881 Message-ID: The commit for JDK-8252881 omitted a test that was added in labsjdk-11: https://github.com/graalvm/labs-openjdk-11/commit/2b71602e3238c8305bea42488a0eaa4b845c5907 This PR adds the missing test. ------------- Commit messages: - 8253874: [JVMCI] added test omitted in 8252881 Changes: https://git.openjdk.java.net/jdk/pull/519/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=519&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253874 Stats: 28 lines in 1 file changed: 22 ins; 1 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/519.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/519/head:pull/519 PR: https://git.openjdk.java.net/jdk/pull/519 From github.com+70893615+jasontatton-aws at openjdk.java.net Tue Oct 6 15:46:12 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Tue, 6 Oct 2020 15:46:12 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v2] In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <-DRWR4_f5u6DsSGHAuPnpHrhaG8Una8BXf4zDekQjLM=.469b08b6-a8b2-4a6b-8ab8-1a40810aede0@github.com> Message-ID: On Fri, 2 Oct 2020 08:44:48 GMT, Jason Tatton wrote: >> Hi Jason, >> >> thanks for bringing String.indexOf() for latin strings up to date with the Unicode version. >> >> Your changes look good except a few minor issues I've commented on right in the code. >> >> I'd only like to ask you if you could possibly improve your test a little bit. As far as I understand, your search text >> is a consecutive sequence of "abc" characters, so you'll always find the character your searching for within the next >> three characters of the source text. This won't exercise the loops of your intrinsic. Maybe you can also add some test >> versions where the search character will be found beyond the first 32/64 characters after "fromIndex"? > > @simonis Thank you for the corrections, I have ammended them in the latest comit as follows: > > Changes to unit test: > - main test adjusted such that Strings gennerated are much longer (up to 2048 characters) and of the form: `azaza`, > `aazaazaa`, `aaazaaazaaa`, etc with `'z'` being the search character searched for. Multiple instances of the search > character are included in the String in order to validate that the starting offset is correctly handleded. Results are > compared to non intrinsified version of the code. Longer strings means that the looping functionality of the various > paths is entered into. > - Run configurations introduced such that it checks behaviour where use of SSE and AVX instructions are restricted. > - Tier4InvocationThreshold adjusted so as to ensure C2 code iis invoked. > > Other changes: > - newlines added at end of files > > @vnkozlov here are the performance numbers as requested. I have included performance of the UTF16 version of the > intrinsic for reference: > | UseAVX= | UseSSE= | Benchmark | Mode | Cnt | Score | Error | Units | > |---------|---------|-----------------------------------|------|-----|-------------|-------------|-------| > | | 0 | IndexOfBenchmark.latin1_long_char | avgt | 5 | **447,493.398** | ? 4,666.386 | ns/op | > | 0 | | IndexOfBenchmark.latin1_long_char | avgt | 5 | **104,735.941** | ? 2,484.403 | ns/op | > | 1 | | IndexOfBenchmark.latin1_long_char | avgt | 5 | **104,342.844** | ? 2,656.343 | ns/op | > | 2 | | IndexOfBenchmark.latin1_long_char | avgt | 5 | **61,000.418** | ? 1,543.951 | ns/op | > | 3 | | IndexOfBenchmark.latin1_long_char | avgt | 5 | **60,607.988** | ? 1,466.354 | ns/op | > | | 0 | IndexOfBenchmark.utf16_long_char | avgt | 5 | 672,475.302 | ? 4,998.596 | ns/op | > | 0 | | IndexOfBenchmark.utf16_long_char | avgt | 5 | 175,521.654 | ? 7,549.094 | ns/op | > | 1 | | IndexOfBenchmark.utf16_long_char | avgt | 5 | 172,514.981 | ? 3,561.040 | ns/op | > | 2 | | IndexOfBenchmark.utf16_long_char | avgt | 5 | 120,725.748 | ? 2,004.400 | ns/op | > | 3 | | IndexOfBenchmark.utf16_long_char | avgt | 5 | 120,664.623 | ? 1,988.419 | ns/op | > > I think the results are as expected, we see improvements in performance as the range of SSE and AVX instructions which > can be used is expanded upon. Note that no improvement is observed with UseAVX=3 because there is no AVX-512 code in > these intrinsics. Hi All, Just wondering if there is anything you'd like me to do in order to assist with moving this patch forward? Thanks, Jason ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From neliasso at openjdk.java.net Tue Oct 6 18:42:08 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 6 Oct 2020 18:42:08 GMT Subject: RFR: 8253923: C2 doesn't always run loop opts for compilations that include loops In-Reply-To: References: Message-ID: On Mon, 5 Oct 2020 06:48:47 GMT, Roland Westrelin wrote: > Thanks for reviewing. I actually made this a draft PR because I realized when I created it that there was a simpler fix > (which is to set the has_loops flag when parsing of a method starts). Can you take a look at the new change? Where do I find the new change? I see no updated files. ------------- PR: https://git.openjdk.java.net/jdk/pull/478 From neliasso at openjdk.java.net Tue Oct 6 18:55:11 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 6 Oct 2020 18:55:11 GMT Subject: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v5] In-Reply-To: References: Message-ID: On Fri, 25 Sep 2020 20:48:10 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8252847 : Modifying file permission to resolve jcheck failure. > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 1264: > >> 1262: } >> 1263: >> 1264: #ifndef PRODUCT > > macroAssembler_x86.hpp become big. May be we should start thing about splitting arraycopy stubs into separate file. But lets do that in a another change. It is good that the AVX3 case is separated out in this change - makes it easy to follow. ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From neliasso at openjdk.java.net Tue Oct 6 20:21:09 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 6 Oct 2020 20:21:09 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v4] In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: On Fri, 2 Oct 2020 08:40:58 GMT, Jason Tatton wrote: >> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for >> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it >> I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as >> code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and >> java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 >> >> Details of testing: >> ============ >> I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note >> that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the >> input String. Hence the test has been designed to cover all these cases. In summary they are: >> - A ?short? string of < 16 characters. >> - A SIMD String of 16 ? 31 characters. >> - A AVX2 SIMD String of 32 characters+. >> >> Hardware used for testing: >> ----------------------------- >> >> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). >> - AWS Graviton 2 (ARM 64 processor). >> >> I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. >> >> Possible future enhancements: >> ==================== >> For the x86 implementation there may be two further improvements we can make in order to improve performance of both >> the StringUTF16 and StringLatin1 indexOf(char) intrinsics: >> 1. Make use of AVX-512 instructions. >> 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD >> instructions instead of a loop. >> Benchmark results: >> ============ >> **Without** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ? 182.581 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ? 435.933 | ns/op | >> >> >> **With** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ? 407.716 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ? 167.306 | ns/op | >> >> >> The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 >> indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when >> running on ARM. > > Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: > > 8173585: Intrinsify StringLatin1.indexOf(char) > > Rewrite of unit test and newlines added to end of files > > Changes to unit test: > - main test adjusted such that Strings gennerated are much longer (up to > 2048 characters) and of the form: azaza, aazaazaa, aaazaaazaaa, etc with > 'z' being the search character searched for. Multiple instances of the > search character are included in the String in order to validate that > the starting offset is correctly handleded. Results are compared to non > intrinsified version of the code. Longer strings means that the looping > functionality of the various paths is entered into. > - Run configurations introduced such that it checks behaviour where use > of SSE and AVX instructions are restricted. > - Tier4InvocationThreshold adjusted so as to ensure C2 code iis invoked. > > Other changes: > - newlines added at end of files test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java line 25: > 23: import jdk.test.lib.Asserts; > 24: > 25: public class TestStringLatin1IndexOfChar{ Can you please add testing for these edge cases: - when the search char is the first char - when the search char is the last char - when the string has length 1 ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From ysuenaga at openjdk.java.net Wed Oct 7 00:51:06 2020 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Wed, 7 Oct 2020 00:51:06 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> Message-ID: <8Eqswd7tsVaGEXHdKDncXqKpW2tBsSeuY0PV6aTB9_c=.a6cf4957-9d31-4e89-bf44-e7b7852205d5@github.com> On Fri, 2 Oct 2020 11:44:51 GMT, Magnus Ihse Bursie wrote: >> @navyxliu I've merged the sources into `src/utils/hsdis` and added support to build it in the Makefile. > > This is an interesting suggestion. There is a similar attempt at replacing binutils with capstone in > https://bugs.openjdk.java.net/browse/JDK-8188073, which unfortunately has not seen much progress due to lack of > resources; I don't know if you are aware of that? There is also a (extremely low priority) effort to rewrite the hsdis > makefile to be part of the normal build system, see e.g. https://bugs.openjdk.java.net/browse/JDK-8208495. Neither of > these should be any blocker for your change, but I think it might be good if you know about them. I have couple of > concerns with your patch. One is the method in which LLVM is selected instead of binutils; afaict this depends on > having the `LLVM` variable set when executing the makefile. At the very least, this should be documented in the README. > I don't think any more complicated configuration is really necessary at this point. With full integration with the > build system, a more user-friendly way of selecting hsdis backend should be implemented, though. Second, and I don't > know if this is an artifact of git/github/the new skara tooling, but if you renamed hsdis.c to hsdis.cpp, this > relationship does not show up, not even in the generated webrevs. Instead they are considered a new + a deleted file. > This makes it hard to see what code changes you have done in that file. And third; have you tested that your changes > (both changing the main file from C to C++, and any code changes in it) does not break the old binutils functionality? > Afaic there are no test suites for exercising hsdis :-( so manual ad-hoc testing is likely needed. Can you separate LLVM and binutils from hsdis.cpp? I guess you say that the problem is both GCC and binutils are not available on Windows AArch64. Is it right? 1 question: binutils seems to support Windows AArch64. Did you try recently binutils? If we can use binutils on Windows AArch64, you can fix makefile only. https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/dlltool.c;h=ed016b97dc38cdb1b85d2f6df676b9c9750f0d41;hb=HEAD#l248 ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From shade at openjdk.java.net Wed Oct 7 07:01:05 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 7 Oct 2020 07:01:05 GMT Subject: RFR: 8253874: [JVMCI] added test omitted in 8252881 In-Reply-To: References: Message-ID: On Tue, 6 Oct 2020 07:58:23 GMT, Doug Simon wrote: > The commit for JDK-8252881 omitted a test that was added in labsjdk-11: > > https://github.com/graalvm/labs-openjdk-11/commit/2b71602e3238c8305bea42488a0eaa4b845c5907 > > This PR adds the missing test. Looks fine. I see the oracle-labs11 changeset [moved an import statement as well.](https://github.com/graalvm/labs-openjdk-11/commit/2b71602e3238c8305bea42488a0eaa4b845c5907#diff-5e1d799366d05dea6fa15a09efdbe016R64) Do you want to do the same for consistency? ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/519 From roland at openjdk.java.net Wed Oct 7 07:38:09 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 7 Oct 2020 07:38:09 GMT Subject: RFR: 8253923: C2 doesn't always run loop opts for compilations that include loops In-Reply-To: References: Message-ID: On Tue, 6 Oct 2020 18:39:51 GMT, Nils Eliasson wrote: > > Thanks for reviewing. I actually made this a draft PR because I realized when I created it that there was a simpler fix > > (which is to set the has_loops flag when parsing of a method starts). Can you take a look at the new change? > > Where do I find the new change? I see no updated files. I updated the change in place (force pushed). https://openjdk.github.io/cr/?repo=jdk&pr=478&range=00 is the new change. ------------- PR: https://git.openjdk.java.net/jdk/pull/478 From roland at openjdk.java.net Wed Oct 7 07:53:22 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 7 Oct 2020 07:53:22 GMT Subject: RFR: 8253756: C2 CompilerThread0 crash in Node::add_req(Node*) Message-ID: The outer strip mined loop is initially created as a skeleton and then expanded once loop opts are over. As long as it is a skeleton, the OuterStripMinedLoopEndNode cannot be constant folded because its input is a place holder. So OuterStripMinedLoopEndNode::Value() blocks constant folding. This bug triggers because the OuterStripMinedLoopEndNode input after expansion is a constant but the OuterStripMinedLoopEndNode is not constant folded (i.e. it's a dead loop but it stays in the graph). The fix I propose is to change the behavior OuterStripMinedLoopEndNode::Value() so it blocks constant folding only until expansion but not after. ------------- Commit messages: - OuterStripMinedLoopEndNode::Value() should behave like IfNode::Value() once the outer strip mined loop is expanded Changes: https://git.openjdk.java.net/jdk/pull/537/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=537&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253756 Stats: 84 lines in 3 files changed: 83 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/537.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/537/head:pull/537 PR: https://git.openjdk.java.net/jdk/pull/537 From xliu at openjdk.java.net Wed Oct 7 08:06:14 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 7 Oct 2020 08:06:14 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: <8Eqswd7tsVaGEXHdKDncXqKpW2tBsSeuY0PV6aTB9_c=.a6cf4957-9d31-4e89-bf44-e7b7852205d5@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <8Eqswd7tsVaGEXHdKDncXqKpW2tBsSeuY0PV6aTB9_c=.a6cf4957-9d31-4e89-bf44-e7b7852205d5@github.com> Message-ID: On Wed, 7 Oct 2020 00:48:24 GMT, Yasumasa Suenaga wrote: >> This is an interesting suggestion. There is a similar attempt at replacing binutils with capstone in >> https://bugs.openjdk.java.net/browse/JDK-8188073, which unfortunately has not seen much progress due to lack of >> resources; I don't know if you are aware of that? There is also a (extremely low priority) effort to rewrite the hsdis >> makefile to be part of the normal build system, see e.g. https://bugs.openjdk.java.net/browse/JDK-8208495. Neither of >> these should be any blocker for your change, but I think it might be good if you know about them. I have couple of >> concerns with your patch. One is the method in which LLVM is selected instead of binutils; afaict this depends on >> having the `LLVM` variable set when executing the makefile. At the very least, this should be documented in the README. >> I don't think any more complicated configuration is really necessary at this point. With full integration with the >> build system, a more user-friendly way of selecting hsdis backend should be implemented, though. Second, and I don't >> know if this is an artifact of git/github/the new skara tooling, but if you renamed hsdis.c to hsdis.cpp, this >> relationship does not show up, not even in the generated webrevs. Instead they are considered a new + a deleted file. >> This makes it hard to see what code changes you have done in that file. And third; have you tested that your changes >> (both changing the main file from C to C++, and any code changes in it) does not break the old binutils functionality? >> Afaic there are no test suites for exercising hsdis :-( so manual ad-hoc testing is likely needed. > > Can you separate LLVM and binutils from hsdis.cpp? > > I guess you say that the problem is both GCC and binutils are not available on Windows AArch64. Is it right? > 1 question: binutils seems to support Windows AArch64. Did you try recently binutils? If we can use binutils on Windows > AArch64, you can fix makefile only. > https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/dlltool.c;h=ed016b97dc38cdb1b85d2f6df676b9c9750f0d41;hb=HEAD#l248 IMHO, it's great to have an alternative disassembler. I personally had better experience using llvm MC when I decoded aarch64 and AVX instructions than BFD. Another argument is that LLVM toolchain is supposed to provide the premium experience on non-gnu platforms such as FreeBSD. @luhenry I tried to build it with LLVM10.0.1 on my x86_64, ubuntu, I ran into a small problem. here is how I build. `$make ARCH=amd64 CC=/opt/llvm/bin/clang CXX=/opt/llvm/bin/clang++ LLVM=/opt/llvm/` I can't meet this condition because Makefile defines LIBOS_linux. #elif defined(LIBOS_Linux) && defined(LIBARCH_amd64) return "x86_64-pc-linux-gnu"; Actually, Makefile assigns OS to windows/linux/aix/macosx (all lower case)and then `CPPFLAGS += -DLIBOS_$(OS) -DLIBOS="$(OS)" -DLIBARCH_$(LIBARCH) -DLIBARCH="$(LIBARCH)" -DLIB_EXT="$(LIB_EXT)"` In hsdis.cpp, `native_target_triple` needs to match whatever Makefile defined. With that fix, I generate llvm version hsdis-amd64.so and it works flawlessly ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From dnsimon at openjdk.java.net Wed Oct 7 08:49:22 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 7 Oct 2020 08:49:22 GMT Subject: RFR: 8253874: [JVMCI] added test omitted in 8252881 In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 06:58:19 GMT, Aleksey Shipilev wrote: >> The commit for JDK-8252881 omitted a test that was added in labsjdk-11: >> >> https://github.com/graalvm/labs-openjdk-11/commit/2b71602e3238c8305bea42488a0eaa4b845c5907 >> >> This PR adds the missing test. > > Looks fine. I see the oracle-labs11 changeset [moved an import statement as > well.](https://github.com/graalvm/labs-openjdk-11/commit/2b71602e3238c8305bea42488a0eaa4b845c5907#diff-5e1d799366d05dea6fa15a09efdbe016R64) > Do you want to do the same for consistency? Thanks for the recommendation @shipilev but these files are already sufficiently different in 11 and 16 that I don't think it's worth worrying about this moved import. ------------- PR: https://git.openjdk.java.net/jdk/pull/519 From dnsimon at openjdk.java.net Wed Oct 7 09:18:10 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 7 Oct 2020 09:18:10 GMT Subject: Integrated: 8253874: [JVMCI] added test omitted in 8252881 In-Reply-To: References: Message-ID: <9dLXSSORqXSaCFqgcwkLvneMBx12oK4vGon_h8OQRaI=.1092de7e-bef4-4c0c-9a03-8d65ca1b0e33@github.com> On Tue, 6 Oct 2020 07:58:23 GMT, Doug Simon wrote: > The commit for JDK-8252881 omitted a test that was added in labsjdk-11: > > https://github.com/graalvm/labs-openjdk-11/commit/2b71602e3238c8305bea42488a0eaa4b845c5907 > > This PR adds the missing test. This pull request has now been integrated. Changeset: 04ca660e Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/04ca660e Stats: 28 lines in 1 file changed: 22 ins; 1 del; 5 mod 8253874: [JVMCI] added test omitted in 8252881 Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/519 From vlivanov at openjdk.java.net Wed Oct 7 11:09:20 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 7 Oct 2020 11:09:20 GMT Subject: RFR: 8253191: C2: Masked byte comparisons with large masks produce wrong result on x86 Message-ID: `testUB_mem_imm` generates erroneous code when `mask` constant is larger than `Byte.MAX_VALUE`. AD instruction in question: instruct testUB_mem_imm(rFlagsReg cr, memory mem, immU8 imm, immI0 zero) %{ match(Set cr (CmpI (AndI (LoadUB mem) imm) zero)); ins_encode %{ __ testb($mem$$Address, $imm$$constant); %} The following instruction sequence is problematic: testb $0x80,0x10(%rdi,%r9,1) jle 0x00000001168789a0 It performs *signed* byte comparison and the immediate is interpreted as a negative value. The original code shape was as follows: movzbl 0x10(%rcx,%r9,1),%r9d test $0x80,%r9d jle 0x000000010a9b6a00 The fix is to narrow the range of accepted mask constants and set the upper limit to `Byte.MAX_VALUE`. Testing: hs-precheckin-comp, hs-tier1, hs-tier2. ------------- Commit messages: - 8253191: C2: Masked byte comparisons with large masks produce wrong result Changes: https://git.openjdk.java.net/jdk/pull/538/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=538&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253191 Stats: 78 lines in 2 files changed: 28 ins; 23 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/538.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/538/head:pull/538 PR: https://git.openjdk.java.net/jdk/pull/538 From vlivanov at openjdk.java.net Wed Oct 7 11:34:08 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 7 Oct 2020 11:34:08 GMT Subject: RFR: 8253756: C2 CompilerThread0 crash in Node::add_req(Node*) In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 07:48:23 GMT, Roland Westrelin wrote: > The outer strip mined loop is initially created as a skeleton and then > expanded once loop opts are over. As long as it is a skeleton, the > OuterStripMinedLoopEndNode cannot be constant folded because its input > is a place holder. So OuterStripMinedLoopEndNode::Value() blocks > constant folding. This bug triggers because the > OuterStripMinedLoopEndNode input after expansion is a constant but the > OuterStripMinedLoopEndNode is not constant folded (i.e. it's a dead > loop but it stays in the graph). The fix I propose is to change the > behavior OuterStripMinedLoopEndNode::Value() so it blocks constant > folding only until expansion but not after. Changes requested by vlivanov (Reviewer). src/hotspot/share/opto/loopnode.cpp line 1729: > 1727: inner_cl->clear_strip_mined(); > 1728: } > 1729: igvn->C->print_method(PHASE_DEBUG, 2); Looks like a leftover from debugging. Otherwise, the fix looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/537 From github.com+8792647+robcasloz at openjdk.java.net Wed Oct 7 13:42:18 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 7 Oct 2020 13:42:18 GMT Subject: RFR: 8253404: C2: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 11:41:20 GMT, Roberto Casta?eda Lozano wrote: > Record nodes as dead in `Node::destruct()` if their index cannot be directly reclaimed. This prevents the "Live Node > limit exceeded limit" assertion failure by improving the accuracy of `Compile::live_nodes()` when "hook" nodes in > `ConvI2LNode::Ideal()` are created and deleted non-consecutively. This addition might result in multiple calls to > `compile::record_dead_node()` for the same node (e.g. from `PhaseIdealLoop::spinup()`), but this is safe, as > `compile::record_dead_node()` is idempotent. Record nodes as dead in Node::destruct() if their index cannot be directly reclaimed. This prevents the "Live Node limit exceeded limit" assertion failure by improving the accuracy of Compile::live_nodes() when "hook" nodes in ConvI2LNode::Ideal() are created and deleted non-consecutively. This addition might result in multiple calls to compile::record_dead_node() for the same node (e.g. from PhaseIdealLoop::spinup()), but this is safe, as compile::record_dead_node() is idempotent. ------------- PR: https://git.openjdk.java.net/jdk/pull/540 From github.com+8792647+robcasloz at openjdk.java.net Wed Oct 7 13:42:18 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 7 Oct 2020 13:42:18 GMT Subject: RFR: 8253404: C2: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit Message-ID: Record nodes as dead in `Node::destruct()` if their index cannot be directly reclaimed. This prevents the "Live Node limit exceeded limit" assertion failure by improving the accuracy of `Compile::live_nodes()` when "hook" nodes in `ConvI2LNode::Ideal()` are created and deleted non-consecutively. This addition might result in multiple calls to `compile::record_dead_node()` for the same node (e.g. from `PhaseIdealLoop::spinup()`), but this is safe, as `compile::record_dead_node()` is idempotent. ------------- Commit messages: - 8253404: C2: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit Changes: https://git.openjdk.java.net/jdk/pull/540/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=540&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253404 Stats: 58 lines in 2 files changed: 57 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/540.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/540/head:pull/540 PR: https://git.openjdk.java.net/jdk/pull/540 From github.com+8792647+robcasloz at openjdk.java.net Wed Oct 7 13:42:18 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 7 Oct 2020 13:42:18 GMT Subject: RFR: 8253404: C2: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: References: Message-ID: <3eSQHp79qK8gZnCgebcLNuXVcJQyZ8d9De1-ocfB3HY=.50ed5827-b612-4e3c-969f-2e13355cc980@github.com> On Wed, 7 Oct 2020 11:42:44 GMT, Roberto Casta?eda Lozano wrote: >> Record nodes as dead in `Node::destruct()` if their index cannot be directly reclaimed. This prevents the "Live Node >> limit exceeded limit" assertion failure by improving the accuracy of `Compile::live_nodes()` when "hook" nodes in >> `ConvI2LNode::Ideal()` are created and deleted non-consecutively. This addition might result in multiple calls to >> `compile::record_dead_node()` for the same node (e.g. from `PhaseIdealLoop::spinup()`), but this is safe, as >> `compile::record_dead_node()` is idempotent. > > Record nodes as dead in Node::destruct() if their index cannot be directly > reclaimed. This prevents the "Live Node limit exceeded limit" assertion failure > by improving the accuracy of Compile::live_nodes() when "hook" nodes in > ConvI2LNode::Ideal() are created and deleted non-consecutively. > > This addition might result in multiple calls to compile::record_dead_node() for > the same node (e.g. from PhaseIdealLoop::spinup()), but this is safe, as > compile::record_dead_node() is idempotent. There is, potentially, a more fundamental issue with the exponential C2 execution time of the `ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y))` optimization in `ConvI2LNode::Ideal()` when optimizing long chains of int add/sub nodes followed by an int-to-long conversion, but I think that can be addressed separately in a RFE. The likelihood of hitting this worst case in "real code" and the actual impact on total execution time is unclear. ------------- PR: https://git.openjdk.java.net/jdk/pull/540 From github.com+8792647+robcasloz at openjdk.java.net Wed Oct 7 13:42:18 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 7 Oct 2020 13:42:18 GMT Subject: RFR: 8253404: C2: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: <3eSQHp79qK8gZnCgebcLNuXVcJQyZ8d9De1-ocfB3HY=.50ed5827-b612-4e3c-969f-2e13355cc980@github.com> References: <3eSQHp79qK8gZnCgebcLNuXVcJQyZ8d9De1-ocfB3HY=.50ed5827-b612-4e3c-969f-2e13355cc980@github.com> Message-ID: On Wed, 7 Oct 2020 12:37:26 GMT, Roberto Casta?eda Lozano wrote: >> Record nodes as dead in Node::destruct() if their index cannot be directly >> reclaimed. This prevents the "Live Node limit exceeded limit" assertion failure >> by improving the accuracy of Compile::live_nodes() when "hook" nodes in >> ConvI2LNode::Ideal() are created and deleted non-consecutively. >> >> This addition might result in multiple calls to compile::record_dead_node() for >> the same node (e.g. from PhaseIdealLoop::spinup()), but this is safe, as >> compile::record_dead_node() is idempotent. > > There is, potentially, a more fundamental issue with the exponential C2 execution time of the `ConvI2L(AddI(x, y)) -> > AddL(ConvI2L(x), ConvI2L(y))` optimization in `ConvI2LNode::Ideal()` when optimizing long chains of int add/sub nodes > followed by an int-to-long conversion, but I think that can be addressed separately in a RFE. The likelihood of hitting > this worst case in "real code" and the actual impact on total execution time is unclear. Tested on tier1-3. ------------- PR: https://git.openjdk.java.net/jdk/pull/540 From iveresov at openjdk.java.net Wed Oct 7 15:11:17 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 7 Oct 2020 15:11:17 GMT Subject: RFR: 8254104: MethodCounters must exist before nmethod is installed Message-ID: <8N51Wy1LCk-2MaXBiV1NlWkfjoQJUU9LpdZ6po5Ty6M=.1f75e166-78ba-4b66-8674-dc5cbbaf566d@github.com> Ensure that MethodCounters for a particular method exist during the nmethod install (for methods that were never run before). Tiered compilation policy depends on the state stored in the counters to function properly. ------------- Commit messages: - Add failure reason. - Ensure MethodCounters exist by the time nmethod is installed. Changes: https://git.openjdk.java.net/jdk/pull/543/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=543&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254104 Stats: 24 lines in 2 files changed: 19 ins; 3 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/543.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/543/head:pull/543 PR: https://git.openjdk.java.net/jdk/pull/543 From dnsimon at openjdk.java.net Wed Oct 7 15:11:17 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 7 Oct 2020 15:11:17 GMT Subject: RFR: 8254104: MethodCounters must exist before nmethod is installed In-Reply-To: <8N51Wy1LCk-2MaXBiV1NlWkfjoQJUU9LpdZ6po5Ty6M=.1f75e166-78ba-4b66-8674-dc5cbbaf566d@github.com> References: <8N51Wy1LCk-2MaXBiV1NlWkfjoQJUU9LpdZ6po5Ty6M=.1f75e166-78ba-4b66-8674-dc5cbbaf566d@github.com> Message-ID: On Wed, 7 Oct 2020 14:35:24 GMT, Igor Veresov wrote: > Ensure that MethodCounters for a particular method exist during the nmethod install (for methods that were never run > before). Tiered compilation policy depends on the state stored in the counters to function properly. Marked as reviewed by dnsimon (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/543 From roland at openjdk.java.net Wed Oct 7 15:20:26 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 7 Oct 2020 15:20:26 GMT Subject: RFR: 8253756: C2 CompilerThread0 crash in Node::add_req(Node*) [v2] In-Reply-To: References: Message-ID: <_BrWbZCEpMWqCUcKKPw0Oqyfr_eQWlSvGkw_6N9Oq6A=.298997dd-7660-4f3a-af15-eb9f82f073dc@github.com> > The outer strip mined loop is initially created as a skeleton and then > expanded once loop opts are over. As long as it is a skeleton, the > OuterStripMinedLoopEndNode cannot be constant folded because its input > is a place holder. So OuterStripMinedLoopEndNode::Value() blocks > constant folding. This bug triggers because the > OuterStripMinedLoopEndNode input after expansion is a constant but the > OuterStripMinedLoopEndNode is not constant folded (i.e. it's a dead > loop but it stays in the graph). The fix I propose is to change the > behavior OuterStripMinedLoopEndNode::Value() so it blocks constant > folding only until expansion but not after. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: remove debugging code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/537/files - new: https://git.openjdk.java.net/jdk/pull/537/files/f511b8f9..407a2fda Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=537&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=537&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/537.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/537/head:pull/537 PR: https://git.openjdk.java.net/jdk/pull/537 From roland at openjdk.java.net Wed Oct 7 15:20:27 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 7 Oct 2020 15:20:27 GMT Subject: RFR: 8253756: C2 CompilerThread0 crash in Node::add_req(Node*) [v2] In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 11:31:30 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> remove debugging code > > src/hotspot/share/opto/loopnode.cpp line 1729: > >> 1727: inner_cl->clear_strip_mined(); >> 1728: } >> 1729: igvn->C->print_method(PHASE_DEBUG, 2); > > Looks like a leftover from debugging. Otherwise, the fix looks good. Good catch. I thought I removed it but obviously didn't. I updated the change. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/537 From kvn at openjdk.java.net Wed Oct 7 15:26:09 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 7 Oct 2020 15:26:09 GMT Subject: RFR: 8254104: MethodCounters must exist before nmethod is installed In-Reply-To: References: <8N51Wy1LCk-2MaXBiV1NlWkfjoQJUU9LpdZ6po5Ty6M=.1f75e166-78ba-4b66-8674-dc5cbbaf566d@github.com> Message-ID: On Wed, 7 Oct 2020 14:48:09 GMT, Doug Simon wrote: >> Ensure that MethodCounters for a particular method exist during the nmethod install (for methods that were never run >> before). Tiered compilation policy depends on the state stored in the counters to function properly. > > Marked as reviewed by dnsimon (Committer). Would be great if you explain in what cases you don't have MethodCounters *AFTER* compilation? Should you do that at the beginning of compilation? We do check for presence of MDO during start of compilation (C2 example): https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L599 ------------- PR: https://git.openjdk.java.net/jdk/pull/543 From vlivanov at openjdk.java.net Wed Oct 7 15:33:09 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 7 Oct 2020 15:33:09 GMT Subject: RFR: 8253756: C2 CompilerThread0 crash in Node::add_req(Node*) [v2] In-Reply-To: <_BrWbZCEpMWqCUcKKPw0Oqyfr_eQWlSvGkw_6N9Oq6A=.298997dd-7660-4f3a-af15-eb9f82f073dc@github.com> References: <_BrWbZCEpMWqCUcKKPw0Oqyfr_eQWlSvGkw_6N9Oq6A=.298997dd-7660-4f3a-af15-eb9f82f073dc@github.com> Message-ID: On Wed, 7 Oct 2020 15:20:26 GMT, Roland Westrelin wrote: >> The outer strip mined loop is initially created as a skeleton and then >> expanded once loop opts are over. As long as it is a skeleton, the >> OuterStripMinedLoopEndNode cannot be constant folded because its input >> is a place holder. So OuterStripMinedLoopEndNode::Value() blocks >> constant folding. This bug triggers because the >> OuterStripMinedLoopEndNode input after expansion is a constant but the >> OuterStripMinedLoopEndNode is not constant folded (i.e. it's a dead >> loop but it stays in the graph). The fix I propose is to change the >> behavior OuterStripMinedLoopEndNode::Value() so it blocks constant >> folding only until expansion but not after. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > remove debugging code Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/537 From iveresov at openjdk.java.net Wed Oct 7 15:35:08 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 7 Oct 2020 15:35:08 GMT Subject: RFR: 8254104: MethodCounters must exist before nmethod is installed In-Reply-To: References: <8N51Wy1LCk-2MaXBiV1NlWkfjoQJUU9LpdZ6po5Ty6M=.1f75e166-78ba-4b66-8674-dc5cbbaf566d@github.com> Message-ID: On Wed, 7 Oct 2020 15:23:15 GMT, Vladimir Kozlov wrote: >> Marked as reviewed by dnsimon (Committer). > > Would be great if you explain in what cases you don't have MethodCounters *AFTER* compilation? > Should you do that at the beginning of compilation? > We do check for presence of MDO during start of compilation (C2 example): > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L599 The need for MethodCounters to be present is driven by the tiered compilation policy. It's not needed by a compiler per se. Moreover, you don't need it if you don't install the code. For example when compilation happens for native image it's not needed. Anyways, seemed like an appropriate place to do it. ------------- PR: https://git.openjdk.java.net/jdk/pull/543 From iveresov at openjdk.java.net Wed Oct 7 16:01:14 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 7 Oct 2020 16:01:14 GMT Subject: RFR: 8254104: MethodCounters must exist before nmethod is installed In-Reply-To: References: <8N51Wy1LCk-2MaXBiV1NlWkfjoQJUU9LpdZ6po5Ty6M=.1f75e166-78ba-4b66-8674-dc5cbbaf566d@github.com> Message-ID: On Wed, 7 Oct 2020 15:31:39 GMT, Igor Veresov wrote: >> Would be great if you explain in what cases you don't have MethodCounters *AFTER* compilation? >> Should you do that at the beginning of compilation? >> We do check for presence of MDO during start of compilation (C2 example): >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L599 > > The need for MethodCounters to be present is driven by the tiered compilation policy. It's not needed by a compiler per > se. Moreover, you don't need it if you don't install the code. For example when compilation happens for native image > it's not needed. Anyways, seemed like an appropriate place to do it. Oh, and to answer your question about when it happens. In most Graal unit tests it compiles and installs methods that have been never executed in the interpreter, which normally lazily allocates the counters. And it doesn't pass through the broker, which would've made sure the counters are there. ------------- PR: https://git.openjdk.java.net/jdk/pull/543 From kvn at openjdk.java.net Wed Oct 7 16:07:06 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 7 Oct 2020 16:07:06 GMT Subject: RFR: 8254104: MethodCounters must exist before nmethod is installed In-Reply-To: References: <8N51Wy1LCk-2MaXBiV1NlWkfjoQJUU9LpdZ6po5Ty6M=.1f75e166-78ba-4b66-8674-dc5cbbaf566d@github.com> Message-ID: On Wed, 7 Oct 2020 15:56:16 GMT, Igor Veresov wrote: >> The need for MethodCounters to be present is driven by the tiered compilation policy. It's not needed by a compiler per >> se. Moreover, you don't need it if you don't install the code. For example when compilation happens for native image >> it's not needed. Anyways, seemed like an appropriate place to do it. > > Oh, and to answer your question about when it happens. In most Graal unit tests it compiles and installs methods that > have been never executed in the interpreter, which normally lazily allocates the counters. And it doesn't pass through > the broker, which would've made sure the counters are there. Okay. MethodCounters are created in Interpreter in all cases regardless TieredCompilation setting: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp#L386 So check for TieredCompilation is strange. I understand that MethodCounters are not created in AOT case. And now I get that it includes Graal unit testing as you pointed (which similar to AOT) which request compilation through Java and not usual Compilation Broker. In which case you fix is reasonable. ------------- PR: https://git.openjdk.java.net/jdk/pull/543 From iveresov at openjdk.java.net Wed Oct 7 16:22:21 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 7 Oct 2020 16:22:21 GMT Subject: RFR: 8254104: MethodCounters must exist before nmethod is installed [v2] In-Reply-To: <8N51Wy1LCk-2MaXBiV1NlWkfjoQJUU9LpdZ6po5Ty6M=.1f75e166-78ba-4b66-8674-dc5cbbaf566d@github.com> References: <8N51Wy1LCk-2MaXBiV1NlWkfjoQJUU9LpdZ6po5Ty6M=.1f75e166-78ba-4b66-8674-dc5cbbaf566d@github.com> Message-ID: > Ensure that MethodCounters for a particular method exist during the nmethod install (for methods that were never run > before). Tiered compilation policy depends on the state stored in the counters to function properly. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Allocate counters unconditionally. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/543/files - new: https://git.openjdk.java.net/jdk/pull/543/files/b626ad9c..982eb51d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=543&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=543&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/543.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/543/head:pull/543 PR: https://git.openjdk.java.net/jdk/pull/543 From kvn at openjdk.java.net Wed Oct 7 16:25:11 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 7 Oct 2020 16:25:11 GMT Subject: RFR: 8254104: MethodCounters must exist before nmethod is installed [v2] In-Reply-To: References: <8N51Wy1LCk-2MaXBiV1NlWkfjoQJUU9LpdZ6po5Ty6M=.1f75e166-78ba-4b66-8674-dc5cbbaf566d@github.com> Message-ID: On Wed, 7 Oct 2020 16:22:21 GMT, Igor Veresov wrote: >> Ensure that MethodCounters for a particular method exist during the nmethod install (for methods that were never run >> before). Tiered compilation policy depends on the state stored in the counters to function properly. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Allocate counters unconditionally. OK ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/543 From github.com+70893615+jasontatton-aws at openjdk.java.net Thu Oct 8 06:55:15 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Thu, 8 Oct 2020 06:55:15 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v5] In-Reply-To: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: > This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for > x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it > I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as > code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and > java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 > > Details of testing: > ============ > I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note > that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the > input String. Hence the test has been designed to cover all these cases. In summary they are: > - A ?short? string of < 16 characters. > - A SIMD String of 16 ? 31 characters. > - A AVX2 SIMD String of 32 characters+. > > Hardware used for testing: > ----------------------------- > > - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). > - AWS Graviton 2 (ARM 64 processor). > > I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. > > Possible future enhancements: > ==================== > For the x86 implementation there may be two further improvements we can make in order to improve performance of both > the StringUTF16 and StringLatin1 indexOf(char) intrinsics: > 1. Make use of AVX-512 instructions. > 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD > instructions instead of a loop. > Benchmark results: > ============ > **Without** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ? 182.581 | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ? 435.933 | ns/op | > > > **With** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ? 407.716 | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ? 167.306 | ns/op | > > > The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 > indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when > running on ARM. Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: 8173585: Intrinsify StringLatin1.indexOf(char) Added new unit test: findOneItem. This will test strings of varying length ensuring that for all lengths one instance of the search char can be found. We check what happens when the search character is in each position of the search string (including first and last positions). ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/71/files - new: https://git.openjdk.java.net/jdk/pull/71/files/c8a2849e..8ead02ab Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=71&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=71&range=03-04 Stats: 34 lines in 1 file changed: 26 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/71.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/71/head:pull/71 PR: https://git.openjdk.java.net/jdk/pull/71 From github.com+70893615+jasontatton-aws at openjdk.java.net Thu Oct 8 06:55:53 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Thu, 8 Oct 2020 06:55:53 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v4] In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: On Tue, 6 Oct 2020 20:18:02 GMT, Nils Eliasson wrote: >> Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: >> >> 8173585: Intrinsify StringLatin1.indexOf(char) >> >> Rewrite of unit test and newlines added to end of files >> >> Changes to unit test: >> - main test adjusted such that Strings gennerated are much longer (up to >> 2048 characters) and of the form: azaza, aazaazaa, aaazaaazaaa, etc with >> 'z' being the search character searched for. Multiple instances of the >> search character are included in the String in order to validate that >> the starting offset is correctly handleded. Results are compared to non >> intrinsified version of the code. Longer strings means that the looping >> functionality of the various paths is entered into. >> - Run configurations introduced such that it checks behaviour where use >> of SSE and AVX instructions are restricted. >> - Tier4InvocationThreshold adjusted so as to ensure C2 code iis invoked. >> >> Other changes: >> - newlines added at end of files > > test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java line 25: > >> 23: import jdk.test.lib.Asserts; >> 24: >> 25: public class TestStringLatin1IndexOfChar{ > > Can you please add testing for these edge cases: > - when the search char is the first char > - when the search char is the last char > - when the string has length 1 Thanks for reviewing this. I have added a new test: `findOneItem` which covers these edge cases ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From mdoerr at openjdk.java.net Thu Oct 8 06:58:50 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 8 Oct 2020 06:58:50 GMT Subject: RFR: 8254190: [s390] interpreter misses exception check after calling =?UTF-8?B?bW9uaeKApg==?= Message-ID: ?torenter The s390 template interpreter currently uses call_VM with check_exceptions = false when calling InterpreterRuntime::monitorenter, but there's a possibility to get an Exception. See JIRA issue for details: https://bugs.openjdk.java.net/browse/JDK-8254190 ------------- Commit messages: - 8254190: [s390] interpreter misses exception check after calling monitorenter Changes: https://git.openjdk.java.net/jdk/pull/553/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=553&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254190 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/553.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/553/head:pull/553 PR: https://git.openjdk.java.net/jdk/pull/553 From shade at openjdk.java.net Thu Oct 8 06:58:55 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 8 Oct 2020 06:58:55 GMT Subject: RFR: 8254190: [s390] interpreter misses exception check after calling =?UTF-8?B?bW9uaeKApg==?= In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 21:37:16 GMT, Martin Doerr wrote: > ?torenter > > The s390 template interpreter currently uses call_VM with check_exceptions = false when calling > InterpreterRuntime::monitorenter, but there's a possibility to get an Exception. > See JIRA issue for details: > https://bugs.openjdk.java.net/browse/JDK-8254190 This makes sense. Looks good. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/553 From neliasso at openjdk.java.net Thu Oct 8 06:59:04 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 8 Oct 2020 06:59:04 GMT Subject: RFR: 8253404: C2: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: References: Message-ID: <3PZikZxQmrxNugpxr-kXlHClJ9nw9LknDzpVqhw8cuw=.7feec5e3-cfe0-41bc-a491-83be3efdf03b@github.com> On Wed, 7 Oct 2020 11:41:20 GMT, Roberto Casta?eda Lozano wrote: > Record nodes as dead in `Node::destruct()` if their index cannot be directly reclaimed. This prevents the "Live Node > limit exceeded limit" assertion failure by improving the accuracy of `Compile::live_nodes()` when "hook" nodes in > `ConvI2LNode::Ideal()` are created and deleted non-consecutively. This addition might result in multiple calls to > `compile::record_dead_node()` for the same node (e.g. from `PhaseIdealLoop::spinup()`), but this is safe, as > `compile::record_dead_node()` is idempotent. Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/540 From github.com+8792647+robcasloz at openjdk.java.net Thu Oct 8 06:59:08 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Oct 2020 06:59:08 GMT Subject: RFR: 8253404: C2: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: <3PZikZxQmrxNugpxr-kXlHClJ9nw9LknDzpVqhw8cuw=.7feec5e3-cfe0-41bc-a491-83be3efdf03b@github.com> References: <3PZikZxQmrxNugpxr-kXlHClJ9nw9LknDzpVqhw8cuw=.7feec5e3-cfe0-41bc-a491-83be3efdf03b@github.com> Message-ID: On Wed, 7 Oct 2020 20:49:22 GMT, Nils Eliasson wrote: > Looks good. Thanks for reviewing, Nils! ------------- PR: https://git.openjdk.java.net/jdk/pull/540 From iveresov at openjdk.java.net Thu Oct 8 07:02:12 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Thu, 8 Oct 2020 07:02:12 GMT Subject: Integrated: 8254104: MethodCounters must exist before nmethod is installed In-Reply-To: <8N51Wy1LCk-2MaXBiV1NlWkfjoQJUU9LpdZ6po5Ty6M=.1f75e166-78ba-4b66-8674-dc5cbbaf566d@github.com> References: <8N51Wy1LCk-2MaXBiV1NlWkfjoQJUU9LpdZ6po5Ty6M=.1f75e166-78ba-4b66-8674-dc5cbbaf566d@github.com> Message-ID: On Wed, 7 Oct 2020 14:35:24 GMT, Igor Veresov wrote: > Ensure that MethodCounters for a particular method exist during the nmethod install (for methods that were never run > before). Tiered compilation policy depends on the state stored in the counters to function properly. This pull request has now been integrated. Changeset: 4e5ef303 Author: Igor Veresov URL: https://git.openjdk.java.net/jdk/commit/4e5ef303 Stats: 24 lines in 2 files changed: 19 ins; 3 del; 2 mod 8254104: MethodCounters must exist before nmethod is installed Reviewed-by: dnsimon, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/543 From thartmann at openjdk.java.net Thu Oct 8 08:10:46 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 8 Oct 2020 08:10:46 GMT Subject: RFR: 8253588: C1: assert(false) failed: unknown register on x86_32 only with -XX:+TraceLinearScanLevel=4 In-Reply-To: References: Message-ID: On Fri, 25 Sep 2020 10:07:04 GMT, Christian Hagedorn wrote: > [JDK-8251093](https://bugs.openjdk.java.net/browse/JDK-8251093) introduced some additional logging of intervals and its > registers in various places. On 32-bit only, we could have two registers for an interval. A hi-register is only used > when the interval has `_num_phys_regs` set to 2. In one such place > ([L5448](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5448)), > we log the hi-register `hint_regHi`. On > [L5441](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5441), > however, we can assign it an invalid register number when `_num_phys_regs` is 1. That was not a problem before > JDK-8251093 as we only used `hint_regHi` later after a `_num_phys_regs == 2` check on > [L5484](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5484). > But the additional logging is performed earlier resulting in this assertion failure when trying to log the invalid > `hint_regHi` register. Thanks, Christian Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/356 From thartmann at openjdk.java.net Thu Oct 8 08:33:42 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 8 Oct 2020 08:33:42 GMT Subject: RFR: 8253191: C2: Masked byte comparisons with large masks produce wrong result on x86 In-Reply-To: References: Message-ID: <5XuxxcWDe7w5SSqXgBf8B7RL4-BYuI0FJKki3JRR9c0=.335baf1e-1183-4370-80ce-6e12b6703723@github.com> On Wed, 7 Oct 2020 10:39:16 GMT, Vladimir Ivanov wrote: > `testUB_mem_imm` generates erroneous code when `mask` constant is larger than `Byte.MAX_VALUE`. > > AD instruction in question: > instruct testUB_mem_imm(rFlagsReg cr, memory mem, immU8 imm, immI0 zero) %{ > match(Set cr (CmpI (AndI (LoadUB mem) imm) zero)); > > ins_encode %{ __ testb($mem$$Address, $imm$$constant); %} > > The following instruction sequence is problematic: > testb $0x80,0x10(%rdi,%r9,1) > jle 0x00000001168789a0 > > It performs *signed* byte comparison and the immediate is interpreted as a negative value. > > The original code shape was as follows: > movzbl 0x10(%rcx,%r9,1),%r9d > test $0x80,%r9d > jle 0x000000010a9b6a00 > > The fix is to narrow the range of accepted mask constants and set the upper limit to `Byte.MAX_VALUE`. > > Testing: hs-precheckin-comp, hs-tier1, hs-tier2. Looks good. test/hotspot/jtreg/compiler/c2/TestUnsignedByteCompare.java line 48: > 46: @DontInline static boolean testByteLT0(byte[] val) { return (val[0] & mask()) < 0; } > 47: > 48: static void testValue(byte b) { Shouldn't you exclude that method from compilation to compare interpreted vs. C2 compiled result? ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/538 From thartmann at openjdk.java.net Thu Oct 8 08:39:44 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 8 Oct 2020 08:39:44 GMT Subject: RFR: 8253756: C2 CompilerThread0 crash in Node::add_req(Node*) [v2] In-Reply-To: <_BrWbZCEpMWqCUcKKPw0Oqyfr_eQWlSvGkw_6N9Oq6A=.298997dd-7660-4f3a-af15-eb9f82f073dc@github.com> References: <_BrWbZCEpMWqCUcKKPw0Oqyfr_eQWlSvGkw_6N9Oq6A=.298997dd-7660-4f3a-af15-eb9f82f073dc@github.com> Message-ID: On Wed, 7 Oct 2020 15:20:26 GMT, Roland Westrelin wrote: >> The outer strip mined loop is initially created as a skeleton and then >> expanded once loop opts are over. As long as it is a skeleton, the >> OuterStripMinedLoopEndNode cannot be constant folded because its input >> is a place holder. So OuterStripMinedLoopEndNode::Value() blocks >> constant folding. This bug triggers because the >> OuterStripMinedLoopEndNode input after expansion is a constant but the >> OuterStripMinedLoopEndNode is not constant folded (i.e. it's a dead >> loop but it stays in the graph). The fix I propose is to change the >> behavior OuterStripMinedLoopEndNode::Value() so it blocks constant >> folding only until expansion but not after. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > remove debugging code Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/537 From thartmann at openjdk.java.net Thu Oct 8 08:42:50 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 8 Oct 2020 08:42:50 GMT Subject: RFR: 8253404: C2: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 11:41:20 GMT, Roberto Casta?eda Lozano wrote: > Record nodes as dead in `Node::destruct()` if their index cannot be directly reclaimed. This prevents the "Live Node > limit exceeded limit" assertion failure by improving the accuracy of `Compile::live_nodes()` when "hook" nodes in > `ConvI2LNode::Ideal()` are created and deleted non-consecutively. This addition might result in multiple calls to > `compile::record_dead_node()` for the same node (e.g. from `PhaseIdealLoop::spinup()`), but this is safe, as > `compile::record_dead_node()` is idempotent. Looks good to me too! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/540 From roland at openjdk.java.net Thu Oct 8 08:43:46 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 8 Oct 2020 08:43:46 GMT Subject: Integrated: 8253756: C2 CompilerThread0 crash in Node::add_req(Node*) In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 07:48:23 GMT, Roland Westrelin wrote: > The outer strip mined loop is initially created as a skeleton and then > expanded once loop opts are over. As long as it is a skeleton, the > OuterStripMinedLoopEndNode cannot be constant folded because its input > is a place holder. So OuterStripMinedLoopEndNode::Value() blocks > constant folding. This bug triggers because the > OuterStripMinedLoopEndNode input after expansion is a constant but the > OuterStripMinedLoopEndNode is not constant folded (i.e. it's a dead > loop but it stays in the graph). The fix I propose is to change the > behavior OuterStripMinedLoopEndNode::Value() so it blocks constant > folding only until expansion but not after. This pull request has now been integrated. Changeset: 76a58527 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/76a58527 Stats: 83 lines in 3 files changed: 82 ins; 0 del; 1 mod 8253756: C2 CompilerThread0 crash in Node::add_req(Node*) Reviewed-by: vlivanov, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/537 From github.com+8792647+robcasloz at openjdk.java.net Thu Oct 8 09:10:45 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Oct 2020 09:10:45 GMT Subject: RFR: 8253404: C2: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: References: Message-ID: On Thu, 8 Oct 2020 08:39:31 GMT, Tobias Hartmann wrote: > Looks good to me too! Thanks Tobias! ------------- PR: https://git.openjdk.java.net/jdk/pull/540 From neliasso at openjdk.java.net Thu Oct 8 09:23:47 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 8 Oct 2020 09:23:47 GMT Subject: RFR: 8253566: clazz.isAssignableFrom will return false for interface implementors [v2] In-Reply-To: References: Message-ID: On Mon, 5 Oct 2020 07:15:48 GMT, Roland Westrelin wrote: >> The code pattern in the test case is optimized as a trichotomy which >> is wrong given SubTypeCheckNode is a special kind of CmpNode that's >> not commutative. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now > contains three commits: > - comment > - test > - trichotomy opt should not be applied to subtype check Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/422 From roland at openjdk.java.net Thu Oct 8 09:36:57 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 8 Oct 2020 09:36:57 GMT Subject: Integrated: 8253566: clazz.isAssignableFrom will return false for interface implementors In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 06:39:02 GMT, Roland Westrelin wrote: > The code pattern in the test case is optimized as a trichotomy which > is wrong given SubTypeCheckNode is a special kind of CmpNode that's > not commutative. This pull request has now been integrated. Changeset: f8603720 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/f8603720 Stats: 71 lines in 2 files changed: 70 ins; 0 del; 1 mod 8253566: clazz.isAssignableFrom will return false for interface implementors Reviewed-by: kvn, thartmann, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/422 From neliasso at openjdk.java.net Thu Oct 8 09:40:47 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 8 Oct 2020 09:40:47 GMT Subject: RFR: 8253923: C2 doesn't always run loop opts for compilations that include loops In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 07:35:22 GMT, Roland Westrelin wrote: > I updated the change in place (force pushed). > https://openjdk.github.io/cr/?repo=jdk&pr=478&range=00 is the new change. That looks exactly like the code I reviewed. ------------- PR: https://git.openjdk.java.net/jdk/pull/478 From roland at openjdk.java.net Thu Oct 8 09:43:47 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 8 Oct 2020 09:43:47 GMT Subject: RFR: 8253923: C2 doesn't always run loop opts for compilations that include loops In-Reply-To: References: Message-ID: On Thu, 8 Oct 2020 09:37:34 GMT, Nils Eliasson wrote: > > I updated the change in place (force pushed). > > https://openjdk.github.io/cr/?repo=jdk&pr=478&range=00 is the new change. > > That looks exactly like the code I reviewed. The initial change would set the has_loops flag before inlining. The current one sets it when a new method is parsed (so once inlining has happened). In any case, I take it you're fine with the current change. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/478 From vlivanov at openjdk.java.net Thu Oct 8 12:08:45 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 8 Oct 2020 12:08:45 GMT Subject: RFR: 8253191: C2: Masked byte comparisons with large masks produce wrong result on x86 In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 10:39:16 GMT, Vladimir Ivanov wrote: > `testUB_mem_imm` generates erroneous code when `mask` constant is larger than `Byte.MAX_VALUE`. > > AD instruction in question: > instruct testUB_mem_imm(rFlagsReg cr, memory mem, immU8 imm, immI0 zero) %{ > match(Set cr (CmpI (AndI (LoadUB mem) imm) zero)); > > ins_encode %{ __ testb($mem$$Address, $imm$$constant); %} > > The following instruction sequence is problematic: > testb $0x80,0x10(%rdi,%r9,1) > jle 0x00000001168789a0 > > It performs *signed* byte comparison and the immediate is interpreted as a negative value. > > The original code shape was as follows: > movzbl 0x10(%rcx,%r9,1),%r9d > test $0x80,%r9d > jle 0x000000010a9b6a00 > > The fix is to narrow the range of accepted mask constants and set the upper limit to `Byte.MAX_VALUE`. > > Testing: hs-precheckin-comp, hs-tier1, hs-tier2. Thanks for review, Tobias. ------------- PR: https://git.openjdk.java.net/jdk/pull/538 From vlivanov at openjdk.java.net Thu Oct 8 12:08:46 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 8 Oct 2020 12:08:46 GMT Subject: RFR: 8253191: C2: Masked byte comparisons with large masks produce wrong result on x86 In-Reply-To: <5XuxxcWDe7w5SSqXgBf8B7RL4-BYuI0FJKki3JRR9c0=.335baf1e-1183-4370-80ce-6e12b6703723@github.com> References: <5XuxxcWDe7w5SSqXgBf8B7RL4-BYuI0FJKki3JRR9c0=.335baf1e-1183-4370-80ce-6e12b6703723@github.com> Message-ID: On Thu, 8 Oct 2020 08:22:48 GMT, Tobias Hartmann wrote: >> `testUB_mem_imm` generates erroneous code when `mask` constant is larger than `Byte.MAX_VALUE`. >> >> AD instruction in question: >> instruct testUB_mem_imm(rFlagsReg cr, memory mem, immU8 imm, immI0 zero) %{ >> match(Set cr (CmpI (AndI (LoadUB mem) imm) zero)); >> >> ins_encode %{ __ testb($mem$$Address, $imm$$constant); %} >> >> The following instruction sequence is problematic: >> testb $0x80,0x10(%rdi,%r9,1) >> jle 0x00000001168789a0 >> >> It performs *signed* byte comparison and the immediate is interpreted as a negative value. >> >> The original code shape was as follows: >> movzbl 0x10(%rcx,%r9,1),%r9d >> test $0x80,%r9d >> jle 0x000000010a9b6a00 >> >> The fix is to narrow the range of accepted mask constants and set the upper limit to `Byte.MAX_VALUE`. >> >> Testing: hs-precheckin-comp, hs-tier1, hs-tier2. > > test/hotspot/jtreg/compiler/c2/TestUnsignedByteCompare.java line 48: > >> 46: @DontInline static boolean testByteLT0(byte[] val) { return (val[0] & mask()) < 0; } >> 47: >> 48: static void testValue(byte b) { > > Shouldn't you exclude that method from compilation to compare interpreted vs. C2 compiled result? I could do that, but it's not required: `testUB_mem_imm` matches memory operand, but `testValue` does `(b & mask())` while `testByte...` do `val[0] & mask()`. So, `testUB_mem_imm` is used only in `testByte...` methods. ------------- PR: https://git.openjdk.java.net/jdk/pull/538 From thartmann at openjdk.java.net Thu Oct 8 12:32:47 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 8 Oct 2020 12:32:47 GMT Subject: RFR: 8253191: C2: Masked byte comparisons with large masks produce wrong result on x86 In-Reply-To: References: Message-ID: On Thu, 8 Oct 2020 12:05:52 GMT, Vladimir Ivanov wrote: >> `testUB_mem_imm` generates erroneous code when `mask` constant is larger than `Byte.MAX_VALUE`. >> >> AD instruction in question: >> instruct testUB_mem_imm(rFlagsReg cr, memory mem, immU8 imm, immI0 zero) %{ >> match(Set cr (CmpI (AndI (LoadUB mem) imm) zero)); >> >> ins_encode %{ __ testb($mem$$Address, $imm$$constant); %} >> >> The following instruction sequence is problematic: >> testb $0x80,0x10(%rdi,%r9,1) >> jle 0x00000001168789a0 >> >> It performs *signed* byte comparison and the immediate is interpreted as a negative value. >> >> The original code shape was as follows: >> movzbl 0x10(%rcx,%r9,1),%r9d >> test $0x80,%r9d >> jle 0x000000010a9b6a00 >> >> The fix is to narrow the range of accepted mask constants and set the upper limit to `Byte.MAX_VALUE`. >> >> Testing: hs-precheckin-comp, hs-tier1, hs-tier2. > > Thanks for review, Tobias. I see, thanks for the explanation! ------------- PR: https://git.openjdk.java.net/jdk/pull/538 From burban at openjdk.java.net Thu Oct 8 12:33:46 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Thu, 8 Oct 2020 12:33:46 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <8Eqswd7tsVaGEXHdKDncXqKpW2tBsSeuY0PV6aTB9_c=.a6cf4957-9d31-4e89-bf44-e7b7852205d5@github.com> Message-ID: On Wed, 7 Oct 2020 08:02:59 GMT, Xin Liu wrote: >> Can you separate LLVM and binutils from hsdis.cpp? >> >> I guess you say that the problem is both GCC and binutils are not available on Windows AArch64. Is it right? >> 1 question: binutils seems to support Windows AArch64. Did you try recently binutils? If we can use binutils on Windows >> AArch64, you can fix makefile only. >> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/dlltool.c;h=ed016b97dc38cdb1b85d2f6df676b9c9750f0d41;hb=HEAD#l248 > > IMHO, it's great to have an alternative disassembler. I personally had better experience using llvm MC when I decoded > aarch64 and AVX instructions than BFD. Another argument is that LLVM toolchain is supposed to provide the premium > experience on non-gnu platforms such as FreeBSD. @luhenry I tried to build it with LLVM10.0.1 > on my x86_64, ubuntu, I ran into a small problem. here is how I build. > `$make ARCH=amd64 CC=/opt/llvm/bin/clang CXX=/opt/llvm/bin/clang++ LLVM=/opt/llvm/` > > I can't meet this condition because Makefile defines LIBOS_linux. > #elif defined(LIBOS_Linux) && defined(LIBARCH_amd64) > return "x86_64-pc-linux-gnu"; > > Actually, Makefile assigns OS to windows/linux/aix/macosx (all lower case)and then > `CPPFLAGS += -DLIBOS_$(OS) -DLIBOS="$(OS)" -DLIBARCH_$(LIBARCH) -DLIBARCH="$(LIBARCH)" -DLIB_EXT="$(LIB_EXT)"` > > In hsdis.cpp, `native_target_triple` needs to match whatever Makefile defined. With that fix, I generate llvm version > hsdis-amd64.so and it works flawlessly > 1 question: binutils seems to support Windows AArch64. Did you try recently binutils? If we can use binutils on Windows > AArch64, you can fix makefile only. > https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/dlltool.c;h=ed016b97dc38cdb1b85d2f6df676b9c9750f0d41;hb=HEAD#l248 This is armv7, I don't see any support for armv8/AArch64 in `dlltool.c`. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From github.com+8792647+robcasloz at openjdk.java.net Thu Oct 8 12:34:47 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Oct 2020 12:34:47 GMT Subject: Integrated: 8253404: C2: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: References: Message-ID: <2Ov3KHvTlom3pzXaVJHPlyj_y-cdmHNDyIhxNPjewjc=.b84aace0-d461-4381-85d0-f10cb0518706@github.com> On Wed, 7 Oct 2020 11:41:20 GMT, Roberto Casta?eda Lozano wrote: > Record nodes as dead in `Node::destruct()` if their index cannot be directly reclaimed. This prevents the "Live Node > limit exceeded limit" assertion failure by improving the accuracy of `Compile::live_nodes()` when "hook" nodes in > `ConvI2LNode::Ideal()` are created and deleted non-consecutively. This addition might result in multiple calls to > `compile::record_dead_node()` for the same node (e.g. from `PhaseIdealLoop::spinup()`), but this is safe, as > `compile::record_dead_node()` is idempotent. This pull request has now been integrated. Changeset: a191c586 Author: Roberto Casta?eda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/a191c586 Stats: 58 lines in 2 files changed: 57 ins; 1 del; 0 mod 8253404: C2: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit Record nodes as dead in Node::destruct() if their index cannot be directly reclaimed. This prevents the "Live Node limit exceeded limit" assertion failure by improving the accuracy of Compile::live_nodes() when "hook" nodes in ConvI2LNode::Ideal() are created and deleted non-consecutively. This addition might result in multiple calls to compile::record_dead_node() for the same node (e.g. from PhaseIdealLoop::spinup()), but this is safe, as compile::record_dead_node() is idempotent. Reviewed-by: neliasso, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/540 From vlivanov at openjdk.java.net Thu Oct 8 12:44:48 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 8 Oct 2020 12:44:48 GMT Subject: Integrated: 8253191: C2: Masked byte comparisons with large masks produce wrong result on x86 In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 10:39:16 GMT, Vladimir Ivanov wrote: > `testUB_mem_imm` generates erroneous code when `mask` constant is larger than `Byte.MAX_VALUE`. > > AD instruction in question: > instruct testUB_mem_imm(rFlagsReg cr, memory mem, immU8 imm, immI0 zero) %{ > match(Set cr (CmpI (AndI (LoadUB mem) imm) zero)); > > ins_encode %{ __ testb($mem$$Address, $imm$$constant); %} > > The following instruction sequence is problematic: > testb $0x80,0x10(%rdi,%r9,1) > jle 0x00000001168789a0 > > It performs *signed* byte comparison and the immediate is interpreted as a negative value. > > The original code shape was as follows: > movzbl 0x10(%rcx,%r9,1),%r9d > test $0x80,%r9d > jle 0x000000010a9b6a00 > > The fix is to narrow the range of accepted mask constants and set the upper limit to `Byte.MAX_VALUE`. > > Testing: hs-precheckin-comp, hs-tier1, hs-tier2. This pull request has now been integrated. Changeset: 6d13c766 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/6d13c766 Stats: 78 lines in 2 files changed: 28 ins; 23 del; 27 mod 8253191: C2: Masked byte comparisons with large masks produce wrong result on x86 Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/538 From aph at redhat.com Thu Oct 8 14:07:16 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 8 Oct 2020 15:07:16 +0100 Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: References: Message-ID: On 05/10/2020 18:40, Boris Ulasevich wrote: > Let me revive the change request to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 & > 0xFF) | ((v2 & 0xFF) << 8)". > > Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. I looked through the dicussion and I can't find an updated benchmark which shows the speedup for the cases you now handle. Is there one? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From redestad at openjdk.java.net Thu Oct 8 15:32:46 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 8 Oct 2020 15:32:46 GMT Subject: RFR: 8254244: Some code emitted by TemplateTable::branch is unused when running TieredCompilation Message-ID: On x86, arm, aarch64 and s390, TemplateTable::branch emits code to allocate a MethodData which is never called if running TieredCompilation. Skipping it slightly reduces interpreter code size and results in a minor startup improvement (~100k instructions less). The PPC implementation differs significantly, and is left untouched. ------------- Commit messages: - Merge branch 'master' into template_notiered - Sync platforms with similar logic - Unnecessarily laying out code to allocate MethodData when running TieredCompilation Changes: https://git.openjdk.java.net/jdk/pull/564/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=564&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254244 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/564.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/564/head:pull/564 PR: https://git.openjdk.java.net/jdk/pull/564 From rrich at openjdk.java.net Thu Oct 8 15:57:21 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Thu, 8 Oct 2020 15:57:21 GMT Subject: RFR: 8254190: [s390] interpreter misses exception check after calling =?UTF-8?B?bW9uaeKApg==?= In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 21:37:16 GMT, Martin Doerr wrote: > ?torenter > > The s390 template interpreter currently uses call_VM with check_exceptions = false when calling > InterpreterRuntime::monitorenter, but there's a possibility to get an Exception. > See JIRA issue for details: > https://bugs.openjdk.java.net/browse/JDK-8254190 The changes look good. Thanks! ------------- Marked as reviewed by rrich (Committer). PR: https://git.openjdk.java.net/jdk/pull/553 From jbhateja at openjdk.java.net Thu Oct 8 17:32:19 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 8 Oct 2020 17:32:19 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: References: <_0-zfIDPieC0Xnc17GaSSsS7Sz9EEUrfjRyqWDtphfU=.298bacde-f330-486a-8bea-03ff1523d00c@github.com> Message-ID: On Wed, 23 Sep 2020 15:27:48 GMT, Jatin Bhateja wrote: >> Can you explain why 32 bytes are such a distinct performance cliff? >> >> Is there any performance difference between doing a single 64 bytes masked copy or two 32 bytes? > >> Can you explain why 32 bytes are such a distinct performance cliff? >> >> Is there any performance difference between doing a single 64 bytes masked copy or two 32 bytes? > > Hi Nils, > Copy for sizes <= 32 bytes can be done using one YMM register, AVX-512 vector length extension allows masked > instructions to operate on YMM and XMM registers. Using newly added flag -XX:ArrayCopyPartialInlineSize=64 one can > perform in-lining up to 64 bytes but since it will use a ZMM register CPU will operate at a lower frequency but it > could still give better performance depending on the application. A single 64 byte masked copy may have a performance > hit if for majority of the application runtime, CPU operates at highest frequency. There is a switchover penalty from > higher frequency level to lower frequency level along with some hysteresis which forces subsequent instructions to > operate a lower frequency for some cycles. Current implementation has been kept simple to avoid emitting too many > instruction at call site considering arraycopy is a very high frequency operation. Hi @neliasso , @vnkozlov , kindly let me know your review comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From luhenry at openjdk.java.net Thu Oct 8 18:10:20 2020 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Thu, 8 Oct 2020 18:10:20 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <8Eqswd7tsVaGEXHdKDncXqKpW2tBsSeuY0PV6aTB9_c=.a6cf4957-9d31-4e89-bf44-e7b7852205d5@github.com> Message-ID: On Thu, 8 Oct 2020 12:30:13 GMT, Bernhard Urban-Forster wrote: >> IMHO, it's great to have an alternative disassembler. I personally had better experience using llvm MC when I decoded >> aarch64 and AVX instructions than BFD. Another argument is that LLVM toolchain is supposed to provide the premium >> experience on non-gnu platforms such as FreeBSD. @luhenry I tried to build it with LLVM10.0.1 >> on my x86_64, ubuntu, I ran into a small problem. here is how I build. >> `$make ARCH=amd64 CC=/opt/llvm/bin/clang CXX=/opt/llvm/bin/clang++ LLVM=/opt/llvm/` >> >> I can't meet this condition because Makefile defines LIBOS_linux. >> #elif defined(LIBOS_Linux) && defined(LIBARCH_amd64) >> return "x86_64-pc-linux-gnu"; >> >> Actually, Makefile assigns OS to windows/linux/aix/macosx (all lower case)and then >> `CPPFLAGS += -DLIBOS_$(OS) -DLIBOS="$(OS)" -DLIBARCH_$(LIBARCH) -DLIBARCH="$(LIBARCH)" -DLIB_EXT="$(LIB_EXT)"` >> >> In hsdis.cpp, `native_target_triple` needs to match whatever Makefile defined. With that fix, I generate llvm version >> hsdis-amd64.so and it works flawlessly > >> 1 question: binutils seems to support Windows AArch64. Did you try recently binutils? If we can use binutils on Windows >> AArch64, you can fix makefile only. >> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/dlltool.c;h=ed016b97dc38cdb1b85d2f6df676b9c9750f0d41;hb=HEAD#l248 > > This is armv7, I don't see any support for armv8/AArch64 in `dlltool.c`. @magicus > This is an interesting suggestion. There is a similar attempt at replacing binutils with capstone in > https://bugs.openjdk.java.net/browse/JDK-8188073, which unfortunately has not seen much progress due to lack of > resources; I don't know if you are aware of that? There is also a (extremely low priority) effort to rewrite the hsdis > makefile to be part of the normal build system, see e.g. https://bugs.openjdk.java.net/browse/JDK-8208495. Neither of > these should be any blocker for your change, but I think it might be good if you know about them. I was not aware of the effort to use capstone to replace/complement binutils in hsdis. I wonder how easy it is to port capstone to platforms in case it doesn't support them. > I have couple of concerns with your patch. One is the method in which LLVM is selected instead of binutils; afaict this > depends on having the LLVM variable set when executing the makefile. At the very least, this should be documented in > the README. I don't think any more complicated configuration is really necessary at this point. With full integration > with the build system, a more user-friendly way of selecting hsdis backend should be implemented, though. I'll add documentation to the Makefile. And I agree, I would prefer not to have to go through the whole build integration to integrate the support for LLVM. > Second, and I don't know if this is an artifact of git/github/the new skara tooling, but if you renamed hsdis.c to > hsdis.cpp, this relationship does not show up, not even in the generated webrevs. Instead they are considered a new + a > deleted file. This makes it hard to see what code changes you have done in that file. That is Git not detecting enough similarities between the two files. I could probably hack my way around and find a way to reduce the code diff if that's something you want. > And third; have you tested that your changes (both changing the main file from C to C++, and any code changes in it) > does not break the old binutils functionality? Afaic there are no test suites for exercising hsdis :-( so manual ad-hoc > testing is likely needed. I've tested on Linux-x86_64 and Linux-AArch64 on top of Windows-AArch64 and macOS-AArch64, and checked that both the binutils builds and works as previously and that the LLVM-based hsdis has an equivalent output. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From luhenry at openjdk.java.net Thu Oct 8 18:18:25 2020 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Thu, 8 Oct 2020 18:18:25 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <8Eqswd7tsVaGEXHdKDncXqKpW2tBsSeuY0PV6aTB9_c=.a6cf4957-9d31-4e89-bf44-e7b7852205d5@github.com> Message-ID: On Thu, 8 Oct 2020 18:07:59 GMT, Ludovic Henry wrote: >>> 1 question: binutils seems to support Windows AArch64. Did you try recently binutils? If we can use binutils on Windows >>> AArch64, you can fix makefile only. >>> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/dlltool.c;h=ed016b97dc38cdb1b85d2f6df676b9c9750f0d41;hb=HEAD#l248 >> >> This is armv7, I don't see any support for armv8/AArch64 in `dlltool.c`. > > @magicus > >> This is an interesting suggestion. There is a similar attempt at replacing binutils with capstone in >> https://bugs.openjdk.java.net/browse/JDK-8188073, which unfortunately has not seen much progress due to lack of >> resources; I don't know if you are aware of that? There is also a (extremely low priority) effort to rewrite the hsdis >> makefile to be part of the normal build system, see e.g. https://bugs.openjdk.java.net/browse/JDK-8208495. Neither of >> these should be any blocker for your change, but I think it might be good if you know about them. > > I was not aware of the effort to use capstone to replace/complement binutils in hsdis. I wonder how easy it is to port > capstone to platforms in case it doesn't support them. >> I have couple of concerns with your patch. One is the method in which LLVM is selected instead of binutils; afaict this >> depends on having the LLVM variable set when executing the makefile. At the very least, this should be documented in >> the README. I don't think any more complicated configuration is really necessary at this point. With full integration >> with the build system, a more user-friendly way of selecting hsdis backend should be implemented, though. > > I'll add documentation to the Makefile. And I agree, I would prefer not to have to go through the whole build > integration to integrate the support for LLVM. >> Second, and I don't know if this is an artifact of git/github/the new skara tooling, but if you renamed hsdis.c to >> hsdis.cpp, this relationship does not show up, not even in the generated webrevs. Instead they are considered a new + a >> deleted file. This makes it hard to see what code changes you have done in that file. > > That is Git not detecting enough similarities between the two files. I could probably hack my way around and find a way > to reduce the code diff if that's something you want. >> And third; have you tested that your changes (both changing the main file from C to C++, and any code changes in it) >> does not break the old binutils functionality? Afaic there are no test suites for exercising hsdis :-( so manual ad-hoc >> testing is likely needed. > > I've tested on Linux-x86_64 and Linux-AArch64 on top of Windows-AArch64 and macOS-AArch64, and checked that both the > binutils builds and works as previously and that the LLVM-based hsdis has an equivalent output. @navyxliu > @luhenry I tried to build it with LLVM10.0.1 > on my x86_64, ubuntu, I ran into a small problem. here is how I build. > $make ARCH=amd64 CC=/opt/llvm/bin/clang CXX=/opt/llvm/bin/clang++ LLVM=/opt/llvm/ > > I can't meet this condition because Makefile defines LIBOS_linux. > > #elif defined(LIBOS_Linux) && defined(LIBARCH_amd64) > return "x86_64-pc-linux-gnu"; > > Actually, Makefile assigns OS to windows/linux/aix/macosx (all lower case)and then > CPPFLAGS += -DLIBOS_$(OS) -DLIBOS="$(OS)" -DLIBARCH_$(LIBARCH) -DLIBARCH="$(LIBARCH)" -DLIB_EXT="$(LIB_EXT)" Interestingly, I did it this way because on my machine `LIBOS_Linux` would get defined instead of `LIBOS_linux`. I tried on WSL which might explain the difference. Could you please share more details on what environment you are using? > In hsdis.cpp, native_target_triple needs to match whatever Makefile defined. With that fix, I generate llvm version > hsdis-amd64.so and it works flawlessly I'm not sure I understand what you mean. Are you saying we should define the native target triple based on the variables in the Makefile? A difficulty I ran into is that there is not always a 1-to-1 mapping between the autoconf/gcc target triple and the LLVM one. For example. you pass `x86_64-gnu-linux` to the OpenJDK's `configure` script, but the equivalent target triple for LLVM is `x86_64-pc-linux-gnu`. Since my plan isn't to use LLVM as the default for all platforms, and because there aren't that many combinations of target OS/ARCH, I am taking the approach of hardcoding the combinations we care about in `hsdis.cpp`. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From mdoerr at openjdk.java.net Thu Oct 8 19:11:23 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 8 Oct 2020 19:11:23 GMT Subject: RFR: 8254244: Some code emitted by TemplateTable::branch is unused when running TieredCompilation In-Reply-To: References: Message-ID: On Thu, 8 Oct 2020 15:26:22 GMT, Claes Redestad wrote: > On x86, arm, aarch64 and s390, TemplateTable::branch emits code to allocate a MethodData which is never called if > running TieredCompilation. Skipping it slightly reduces interpreter code size and results in a minor startup > improvement (~100k instructions less). The PPC implementation differs significantly, and is left untouched. Hi Claes, looks good. Seems like PPC was implemented like SPARC which was a bit different and is not affected by this issue. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/564 From shade at openjdk.java.net Thu Oct 8 19:13:29 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 8 Oct 2020 19:13:29 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v6] In-Reply-To: <9thYXJXttNl2VIj3C5eGuFwEIwgZZ-TeGbeX67LqQl0=.987baa49-c4c4-466a-b115-ca261bec62b7@github.com> References: <9thYXJXttNl2VIj3C5eGuFwEIwgZZ-TeGbeX67LqQl0=.987baa49-c4c4-466a-b115-ca261bec62b7@github.com> Message-ID: On Thu, 1 Oct 2020 06:18:40 GMT, Igor Ignatyev wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Use SB to build TestStringIntrinsics message, avoid type guessing in ArrayDiff > > LGTM @iignatev, @lmesnik: would you like to sponsor this? ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From redestad at openjdk.java.net Thu Oct 8 19:38:21 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 8 Oct 2020 19:38:21 GMT Subject: RFR: 8254244: Some code emitted by TemplateTable::branch is unused when running TieredCompilation In-Reply-To: References: Message-ID: On Thu, 8 Oct 2020 19:08:14 GMT, Martin Doerr wrote: > Hi Claes, looks good. Seems like PPC was implemented like SPARC which was a bit different and is not affected by this > issue. Martin, thanks for reviewing and checking PPC. ------------- PR: https://git.openjdk.java.net/jdk/pull/564 From mdoerr at openjdk.java.net Thu Oct 8 19:42:19 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 8 Oct 2020 19:42:19 GMT Subject: RFR: 8254190: [s390] interpreter misses exception check after calling monitorenter In-Reply-To: References: Message-ID: On Thu, 8 Oct 2020 15:54:51 GMT, Richard Reingruber wrote: >> The s390 template interpreter currently uses call_VM with check_exceptions = false when calling >> InterpreterRuntime::monitorenter, but there's a possibility to get an Exception. >> See JIRA issue for details: >> https://bugs.openjdk.java.net/browse/JDK-8254190 > > The changes look good. Thanks! Thanks for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/553 From xliu at openjdk.java.net Thu Oct 8 20:43:26 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 8 Oct 2020 20:43:26 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <8Eqswd7tsVaGEXHdKDncXqKpW2tBsSeuY0PV6aTB9_c=.a6cf4957-9d31-4e89-bf44-e7b7852205d5@github.com> Message-ID: <2S00ucaPGiAQLeLOejt1kfXeYEc7ctEPeRCIcq1N0N8=.dbf1ea7a-8de4-48a5-8759-03495e3e3c08@github.com> On Thu, 8 Oct 2020 18:15:10 GMT, Ludovic Henry wrote: > @navyxliu > > > @luhenry I tried to build it with LLVM10.0.1 > > on my x86_64, ubuntu, I ran into a small problem. here is how I build. > > $make ARCH=amd64 CC=/opt/llvm/bin/clang CXX=/opt/llvm/bin/clang++ LLVM=/opt/llvm/ > > I can't meet this condition because Makefile defines LIBOS_linux. > > #elif defined(LIBOS_Linux) && defined(LIBARCH_amd64) > > return "x86_64-pc-linux-gnu"; > > Actually, Makefile assigns OS to windows/linux/aix/macosx (all lower case)and then > > CPPFLAGS += -DLIBOS_$(OS) -DLIBOS="$(OS)" -DLIBARCH_$(LIBARCH) -DLIBARCH="$(LIBARCH)" -DLIB_EXT="$(LIB_EXT)" > > Interestingly, I did it this way because on my machine `LIBOS_Linux` would get defined instead of `LIBOS_linux`. I > tried on WSL which might explain the difference. Could you please share more details on what environment you are using? I am using ubuntu 18.04. `OS = $(shell uname)` does initialize OS=Linux in the first place, but later OS is set to "linux" at line 88 of https://openjdk.github.io/cr/?repo=jdk&pr=392&range=05#new-0 At line 186, -DLIBOS_linux -DLIBOS="linux" ... It doesn't match line 564 of https://openjdk.github.io/cr/?repo=jdk&pr=392&range=05#new-2 in my understanding, C/C++ macros are all case sensitive. I got #error "unknown platform" because of Linux/linux discrepancy. > > In hsdis.cpp, native_target_triple needs to match whatever Makefile defined. With that fix, I generate llvm version > > hsdis-amd64.so and it works flawlessly > > I'm not sure I understand what you mean. Are you saying we should define the native target triple based on the > variables in the Makefile? > A difficulty I ran into is that there is not always a 1-to-1 mapping between the autoconf/gcc target triple and the > LLVM one. For example. you pass `x86_64-gnu-linux` to the OpenJDK's `configure` script, but the equivalent target > triple for LLVM is `x86_64-pc-linux-gnu`. Since my plan isn't to use LLVM as the default for all platforms, and > because there aren't that many combinations of target OS/ARCH, I am taking the approach of hardcoding the combinations > we care about in `hsdis.cpp`. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From iignatyev at openjdk.java.net Thu Oct 8 20:48:38 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Thu, 8 Oct 2020 20:48:38 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v9] In-Reply-To: <8MKmW-2_q3RYbIsye58L3XfOmaLItKlb8M8WB2-1cmI=.4b5a0dc0-e379-47bf-8479-10f18f0cbf0e@github.com> References: <8MKmW-2_q3RYbIsye58L3XfOmaLItKlb8M8WB2-1cmI=.4b5a0dc0-e379-47bf-8479-10f18f0cbf0e@github.com> Message-ID: On Sat, 3 Oct 2020 19:15:00 GMT, Evgeny Nikitin wrote: >> pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) >> >> Error reporting was improved by writing a C-style escaped string representations for the variables passed to the >> methods being tested. For array comparisons, a dedicated diff-formatter was implemented. >> Sample output for comparing byte arrays (with artificial failure): >> ----------System.err:(21/1553)---------- >> Result: (false) of 'arrayEqualsB' is not equal to expected (true) >> Arrays differ starting from [index: 7]: >> ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... >> ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... >> ^^^^ >> java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not >> equal to expected (true) >> at >> compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) >> at ... stack trace continues - E.N. >> Sample output for comparing char arrays: >> ----------System.err:(21/1579)*---------- >> Result: (false) of 'arrayEqualsC' is not equal to expected (true) >> Arrays differ starting from [index: 7]: >> ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... >> ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... >> ^^^^^^^ >> java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not >> equal to expected (true) >> at >> compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) >> at >> ... and so on - E.N. >> >> testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Add null values support and two-way testing Changes requested by iignatyev (Reviewer). test/lib/jdk/test/lib/format/ArrayDiff.java line 117: > 115: if (second == null) { > 116: throw new IllegalArgumentException("Second array argument for ArrayDiff is null"); > 117: } it's more common (and less surprising) to throw NPE in such cases, preferably by using `Objects::requireNonNull` test/lib/jdk/test/lib/format/Diff.java line 37: > 35: * Default limits for formatters > 36: */ > 37: public static class Defaults { I'd add a no-op private no-arg ctor to show that this class isn't expected to be instantiated. test/lib/jdk/test/lib/format/ArrayCodec.java line 325: > 323: } > 324: > 325: return 0; shouldn't it be -1? ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From mdoerr at openjdk.java.net Thu Oct 8 20:56:21 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 8 Oct 2020 20:56:21 GMT Subject: Integrated: 8254190: [s390] interpreter misses exception check after calling monitorenter In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 21:37:16 GMT, Martin Doerr wrote: > The s390 template interpreter currently uses call_VM with check_exceptions = false when calling > InterpreterRuntime::monitorenter, but there's a possibility to get an Exception. > See JIRA issue for details: > https://bugs.openjdk.java.net/browse/JDK-8254190 This pull request has now been integrated. Changeset: ced46b19 Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/ced46b19 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod 8254190: [s390] interpreter misses exception check after calling monitorenter Reviewed-by: shade, rrich ------------- PR: https://git.openjdk.java.net/jdk/pull/553 From coleenp at openjdk.java.net Thu Oct 8 22:54:18 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 8 Oct 2020 22:54:18 GMT Subject: RFR: 8254244: Some code emitted by TemplateTable::branch is unused when running TieredCompilation In-Reply-To: References: Message-ID: <2cdAlTHn54yHIQHnof6uahQCXbiFSlrSd5kE08xIPKA=.b6c96518-6aa2-4fab-9a9d-2f304080db74@github.com> On Thu, 8 Oct 2020 15:26:22 GMT, Claes Redestad wrote: > On x86, arm, aarch64 and s390, TemplateTable::branch emits code to allocate a MethodData which is never called if > running TieredCompilation. Skipping it slightly reduces interpreter code size and results in a minor startup > improvement (~100k instructions less). The PPC implementation differs significantly, and is left untouched. LGTM! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/564 From iveresov at openjdk.java.net Fri Oct 9 01:27:19 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 9 Oct 2020 01:27:19 GMT Subject: RFR: 8254244: Some code emitted by TemplateTable::branch is unused when running TieredCompilation In-Reply-To: References: Message-ID: On Thu, 8 Oct 2020 15:26:22 GMT, Claes Redestad wrote: > On x86, arm, aarch64 and s390, TemplateTable::branch emits code to allocate a MethodData which is never called if > running TieredCompilation. Skipping it slightly reduces interpreter code size and results in a minor startup > improvement (~100k instructions less). The PPC implementation differs significantly, and is left untouched. Marked as reviewed by iveresov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/564 From xxinliu at amazon.com Fri Oct 9 06:20:47 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Fri, 9 Oct 2020 06:20:47 +0000 Subject: Question about a few properties of ideal graph Message-ID: <171D40B4-23CB-49E0-8850-4C4EF021C972@amazon.com> Hi, Community, I feel the more I read C2's optimizations, the more questions arise. I do have some previous experiences in conventional IRs, but it's a little bit hard to map them to c2. Could you help me to clarify some properties of the node and graph? I think they are critical to understand why some phases iterate the outputs of nodes(du-chain) and some phases iterate the inputs of nodes(ud-chain). 1. Useless. [1] defines an operation is useless if no operation uses its results, or if all uses of the results are dead (10.2) Presumably, a node is useless if it?s not useful. Can I say identify_useful_nodes() is same as the definition above? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L302 2. Unreachable. In my understanding, a node is unreachable if control flow never be there. I feel this definition only fits in CFG. Is it still the same meaning in ideal graph? According to the code here: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/node.cpp#L744, it looks like a node is unreachable if 1) no use or 2) its type is TOP or 3) its control input node's is_top() is true. Is it complementary to Node::is_reachable_from_root()? To be honest, I don't understand why finding the root from a node by following uses using BFS is same thing as the control flow can reach it from the root. 3. dead: dead is everywhere in c2. I feel it could refer to different things in different places. 1. useless? e.g. Compile::_dead_node_list https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L326 2. no direct use e.g. outcnt() == 0 is dead. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/phaseX.hpp#L511 3. sometimes, I feel c2 removes a dead node because it's unreachable. Is there a definition of dead node in c2? Or dead/useless/unreachable are all same thing in ideal graph? 4. top. What's semantic if a node's is_top() is true? Is it same thing as its type is TOP? In type lattice, TOP is vague to me too. I feel that the type of a node is TOP has a slight different meaning for different nodes. If the node is a CFG node, TOP seems to mean control flow can't reach to it. If a value node whose type is TOP, I guess it means the value of this node is undefined. I am correct here? 5. root node Can I say the root node of each compilation unit(CU) is the beginning and the end of that CU? So far, I feel the inputs of a root node are return values and side effect. It that correct? If I traverse uses of nodes from root, I should return to root eventually? if yes, it means ideal graph is a DAG. Thanks you in advance! [1] Cooper, Keith, and Linda Torczon.?Engineering a compiler. Elsevier, 2011. From enikitin at openjdk.java.net Fri Oct 9 07:15:45 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 9 Oct 2020 07:15:45 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v10] In-Reply-To: References: Message-ID: > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Throw NPE instead of IAE when a null source array is provided ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/112/files - new: https://git.openjdk.java.net/jdk/pull/112/files/39992fcb..ba6e5411 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=08-09 Stats: 11 lines in 2 files changed: 2 ins; 5 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/112.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/112/head:pull/112 PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Fri Oct 9 07:15:46 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 9 Oct 2020 07:15:46 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v9] In-Reply-To: References: <8MKmW-2_q3RYbIsye58L3XfOmaLItKlb8M8WB2-1cmI=.4b5a0dc0-e379-47bf-8479-10f18f0cbf0e@github.com> Message-ID: On Thu, 8 Oct 2020 20:42:12 GMT, Igor Ignatyev wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Add null values support and two-way testing > > test/lib/jdk/test/lib/format/ArrayDiff.java line 117: > >> 115: if (second == null) { >> 116: throw new IllegalArgumentException("Second array argument for ArrayDiff is null"); >> 117: } > > it's more common (and less surprising) to throw NPE in such cases, preferably by using `Objects::requireNonNull` Haven't known about the `requireNonNull`, thanks. I was thinking about the NPE myself, just wanted to show up which array was null. Fixed in the ba6e5411b4de5cf8b9230b7cdb7a518900eadc4b. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Fri Oct 9 07:25:45 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 9 Oct 2020 07:25:45 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v11] In-Reply-To: References: Message-ID: > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Disable instantiation of format.Diff.Defaults ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/112/files - new: https://git.openjdk.java.net/jdk/pull/112/files/ba6e5411..78ed79f3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=09-10 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/112.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/112/head:pull/112 PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Fri Oct 9 07:25:46 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 9 Oct 2020 07:25:46 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v9] In-Reply-To: References: <8MKmW-2_q3RYbIsye58L3XfOmaLItKlb8M8WB2-1cmI=.4b5a0dc0-e379-47bf-8479-10f18f0cbf0e@github.com> Message-ID: On Thu, 8 Oct 2020 20:44:00 GMT, Igor Ignatyev wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Add null values support and two-way testing > > test/lib/jdk/test/lib/format/Diff.java line 37: > >> 35: * Default limits for formatters >> 36: */ >> 37: public static class Defaults { > > I'd add a no-op private no-arg ctor to show that this class isn't expected to be instantiated. Got it, fixed in the 78ed79f38262fec1df3ffb7ba407dc792666d1fd. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Fri Oct 9 08:00:52 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 9 Oct 2020 08:00:52 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v12] In-Reply-To: References: Message-ID: > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Fix ArrayCodec's behaviour for preffix and identical arrays ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/112/files - new: https://git.openjdk.java.net/jdk/pull/112/files/78ed79f3..174148ab Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=11 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=112&range=10-11 Stats: 7 lines in 1 file changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/112.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/112/head:pull/112 PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Fri Oct 9 08:00:54 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 9 Oct 2020 08:00:54 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v9] In-Reply-To: References: <8MKmW-2_q3RYbIsye58L3XfOmaLItKlb8M8WB2-1cmI=.4b5a0dc0-e379-47bf-8479-10f18f0cbf0e@github.com> Message-ID: <0nytWVYaAK9Jb_Nr82oNlxbz8sYBizQZ-n4aJATyPTg=.b7a0e879-86be-4584-8eef-168cb4c52abb@github.com> On Thu, 8 Oct 2020 20:45:50 GMT, Igor Ignatyev wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Add null values support and two-way testing > > test/lib/jdk/test/lib/format/ArrayCodec.java line 325: > >> 323: } >> 324: >> 325: return 0; > > shouldn't it be -1? An interesting notice, actually. It should be either `-1` or `result` depending on whether arrays are equal, or one array is a prefix of another. Fixed in the 174148ab41e1c03d46dbde8862865dbaf3138843. ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From github.com+8792647+robcasloz at openjdk.java.net Fri Oct 9 08:35:28 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 9 Oct 2020 08:35:28 GMT Subject: RFR: 8253765: C2: Control randomization in StressLCM and StressGCM Message-ID: Use the compilation-local seed in `StressLCM` and `StressGCM` rather than the global one. As a consequence, these options use by default a fresh seed in every compilation, unless `StressSeed=N` is specified, in which case they behave deterministically. Annotate tests that use `StressLCM` and `StressGCM` with the `stress` and `randomness` keys to reflect this change in default behavior. Tested on `tier1` and on all test cases that use `StressLCM` and `StressGCM` (10 times each). ------------- Commit messages: - 8253765: C2: Control randomization in StressLCM and StressGCM Changes: https://git.openjdk.java.net/jdk/pull/572/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=572&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253765 Stats: 104 lines in 18 files changed: 90 ins; 0 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/572/head:pull/572 PR: https://git.openjdk.java.net/jdk/pull/572 From github.com+8792647+robcasloz at openjdk.java.net Fri Oct 9 08:35:28 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 9 Oct 2020 08:35:28 GMT Subject: RFR: 8253765: C2: Control randomization in StressLCM and StressGCM In-Reply-To: References: Message-ID: On Fri, 9 Oct 2020 08:09:05 GMT, Roberto Casta?eda Lozano wrote: > Use the compilation-local seed in `StressLCM` and `StressGCM` rather than the global one. As a consequence, these > options use by default a fresh seed in every compilation, unless `StressSeed=N` is specified, in which case they behave > deterministically. Annotate tests that use `StressLCM` and `StressGCM` with the `stress` and `randomness` keys to > reflect this change in default behavior. Tested on `tier1` and on all test cases that use `StressLCM` and `StressGCM` > (10 times each). Use the compilation-local seed in 'StressLCM' and 'StressGCM' rather than the global one. As a consequence, these options use by default a fresh seed in every compilation, unless 'StressSeed=N' is specified, in which case they behave deterministically. Annotate tests that use 'StressLCM' and 'StressGCM' with the 'stress' and 'randomness' keys to reflect this change in default behavior. ------------- PR: https://git.openjdk.java.net/jdk/pull/572 From enikitin at openjdk.java.net Fri Oct 9 10:46:23 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 9 Oct 2020 10:46:23 GMT Subject: RFR: 8253566: clazz.isAssignableFrom will return false for interface implementors [v2] In-Reply-To: References: Message-ID: On Mon, 5 Oct 2020 07:15:48 GMT, Roland Westrelin wrote: >> The code pattern in the test case is optimized as a trichotomy which >> is wrong given SubTypeCheckNode is a special kind of CmpNode that's >> not commutative. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now > contains three commits: > - comment > - test > - trichotomy opt should not be applied to subtype check test/hotspot/jtreg/compiler/types/TestSubTypeCheckMacroTrichotomy.java line 47: > 45: > 46: private static int test(Class c1, Class c2) { > 47: if (c1 == null) { I'd like to ask why we have these empty null checks? To silence some warnings? ------------- PR: https://git.openjdk.java.net/jdk/pull/422 From roland at openjdk.java.net Fri Oct 9 10:52:19 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 9 Oct 2020 10:52:19 GMT Subject: RFR: 8253566: clazz.isAssignableFrom will return false for interface implementors [v2] In-Reply-To: References: Message-ID: On Fri, 9 Oct 2020 10:43:46 GMT, Evgeny Nikitin wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now >> contains three commits: >> - comment >> - test >> - trichotomy opt should not be applied to subtype check > > test/hotspot/jtreg/compiler/types/TestSubTypeCheckMacroTrichotomy.java line 47: > >> 45: >> 46: private static int test(Class c1, Class c2) { >> 47: if (c1 == null) { > > I'd like to ask why we have these empty null checks? To silence some warnings? If profiling reports the branch not taken, C2 compiles: if (o == null) { } to: if (o == null) { trap; } o = cast_to_non_null(o); It's a way to control where in the compiled code a particular object is null checked ------------- PR: https://git.openjdk.java.net/jdk/pull/422 From redestad at openjdk.java.net Fri Oct 9 11:05:20 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 9 Oct 2020 11:05:20 GMT Subject: Integrated: 8254244: Some code emitted by TemplateTable::branch is unused when running TieredCompilation In-Reply-To: References: Message-ID: On Thu, 8 Oct 2020 15:26:22 GMT, Claes Redestad wrote: > On x86, arm, aarch64 and s390, TemplateTable::branch emits code to allocate a MethodData which is never called if > running TieredCompilation. Skipping it slightly reduces interpreter code size and results in a minor startup > improvement (~100k instructions less). The PPC implementation differs significantly, and is left untouched. This pull request has now been integrated. Changeset: 9cecc167 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/9cecc167 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod 8254244: Some code emitted by TemplateTable::branch is unused when running TieredCompilation Reviewed-by: mdoerr, coleenp, iveresov ------------- PR: https://git.openjdk.java.net/jdk/pull/564 From redestad at openjdk.java.net Fri Oct 9 11:05:19 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 9 Oct 2020 11:05:19 GMT Subject: RFR: 8254244: Some code emitted by TemplateTable::branch is unused when running TieredCompilation In-Reply-To: References: Message-ID: <_tZ22A8zQFoc4rzir_sU0zgwnPJZWQ1sjVAG1Zx7U1w=.08100886-d809-4f4d-a086-c136ce38ee8c@github.com> On Fri, 9 Oct 2020 01:24:41 GMT, Igor Veresov wrote: >> On x86, arm, aarch64 and s390, TemplateTable::branch emits code to allocate a MethodData which is never called if >> running TieredCompilation. Skipping it slightly reduces interpreter code size and results in a minor startup >> improvement (~100k instructions less). The PPC implementation differs significantly, and is left untouched. > > Marked as reviewed by iveresov (Reviewer). Thanks for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/564 From fyang at openjdk.java.net Fri Oct 9 11:53:31 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Fri, 9 Oct 2020 11:53:31 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v5] In-Reply-To: References: Message-ID: > Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com > > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. > Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 > > Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. > For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. > "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. > > We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. > Patch passed jtreg tier1-3 tests with QEMU system emulator. > Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make > sure that there's no regression. > We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java > We measured the performance benefit with an aarch64 cycle-accurate simulator. > Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. > > For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on > real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is > detected. Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge master - Add sha3 instructions to cpu/aarch64/aarch64-asmtest.py and regenerate the test in assembler_aarch64.cpp:asm_check - Rebase - Merge master - Fix trailing whitespace issue - 8252204: AArch64: Implement SHA3 accelerator/intrinsic Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com ------------- Changes: https://git.openjdk.java.net/jdk/pull/207/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=207&range=04 Stats: 1512 lines in 35 files changed: 1025 ins; 22 del; 465 mod Patch: https://git.openjdk.java.net/jdk/pull/207.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/207/head:pull/207 PR: https://git.openjdk.java.net/jdk/pull/207 From aph at openjdk.java.net Fri Oct 9 12:22:15 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 9 Oct 2020 12:22:15 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v5] In-Reply-To: References: Message-ID: On Fri, 9 Oct 2020 11:53:31 GMT, Fei Yang wrote: >> Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com >> >> This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. >> Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 >> >> Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. >> For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. >> "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. >> >> We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. >> Patch passed jtreg tier1-3 tests with QEMU system emulator. >> Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make >> sure that there's no regression. >> We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java >> We measured the performance benefit with an aarch64 cycle-accurate simulator. >> Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. >> >> For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on >> real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is >> detected. > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains > six commits: > - Merge master > - Add sha3 instructions to cpu/aarch64/aarch64-asmtest.py and regenerate the test in assembler_aarch64.cpp:asm_check > - Rebase > - Merge master > - Fix trailing whitespace issue > - 8252204: AArch64: Implement SHA3 accelerator/intrinsic > Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From rkennke at openjdk.java.net Fri Oct 9 14:16:17 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 9 Oct 2020 14:16:17 GMT Subject: RFR: 8254314: Shenandoah: null checks in c2 should not skip over native load barrier In-Reply-To: References: Message-ID: On Fri, 9 Oct 2020 14:03:21 GMT, Roland Westrelin wrote: > C2 optimizes (CmpP (LoadBarrier o) NULL) as (CmpP o NULL). The ative > load barrier is not guaranteed to return a non null oop when passed a > non null oop so this optimization could lead to a crash. It looks good to me! While this doesn't seem to reproduce with normal native-barriers yet, it's been a major headache in the ongoing work on concurrent weak reference processing, and I don't see why normal native-barriers wouldn't be affected by it. We've probably been lucky so far. ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/576 From roland at openjdk.java.net Fri Oct 9 14:16:17 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 9 Oct 2020 14:16:17 GMT Subject: RFR: 8254314: Shenandoah: null checks in c2 should not skip over native load barrier Message-ID: C2 optimizes (CmpP (LoadBarrier o) NULL) as (CmpP o NULL). The ative load barrier is not guaranteed to return a non null oop when passed a non null oop so this optimization could lead to a crash. ------------- Commit messages: - Shenandoah: null checks in c2 should not skip over native load barrier Changes: https://git.openjdk.java.net/jdk/pull/576/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=576&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254314 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/576.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/576/head:pull/576 PR: https://git.openjdk.java.net/jdk/pull/576 From iignatyev at openjdk.java.net Fri Oct 9 14:35:15 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 9 Oct 2020 14:35:15 GMT Subject: RFR: 8229186: Improve error messages for TestStringIntrinsics failures [v12] In-Reply-To: References: Message-ID: On Fri, 9 Oct 2020 08:00:52 GMT, Evgeny Nikitin wrote: >> pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) >> >> Error reporting was improved by writing a C-style escaped string representations for the variables passed to the >> methods being tested. For array comparisons, a dedicated diff-formatter was implemented. >> Sample output for comparing byte arrays (with artificial failure): >> ----------System.err:(21/1553)---------- >> Result: (false) of 'arrayEqualsB' is not equal to expected (true) >> Arrays differ starting from [index: 7]: >> ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... >> ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... >> ^^^^ >> java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not >> equal to expected (true) >> at >> compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) >> at ... stack trace continues - E.N. >> Sample output for comparing char arrays: >> ----------System.err:(21/1579)*---------- >> Result: (false) of 'arrayEqualsC' is not equal to expected (true) >> Arrays differ starting from [index: 7]: >> ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... >> ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... >> ^^^^^^^ >> java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not >> equal to expected (true) >> at >> compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) >> at >> ... and so on - E.N. >> >> testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Fix ArrayCodec's behaviour for preffix and identical arrays Marked as reviewed by iignatyev (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From enikitin at openjdk.java.net Fri Oct 9 16:52:18 2020 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 9 Oct 2020 16:52:18 GMT Subject: Integrated: 8229186: Improve error messages for TestStringIntrinsics failures In-Reply-To: References: Message-ID: On Thu, 10 Sep 2020 12:20:05 GMT, Evgeny Nikitin wrote: > pre-Skara RFR thread: [link](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-May/038416.html) > > Error reporting was improved by writing a C-style escaped string representations for the variables passed to the > methods being tested. For array comparisons, a dedicated diff-formatter was implemented. > Sample output for comparing byte arrays (with artificial failure): > ----------System.err:(21/1553)---------- > Result: (false) of 'arrayEqualsB' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ... 5, 6, 125, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... > ^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsB' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:273) > at ... stack trace continues - E.N. > Sample output for comparing char arrays: > ----------System.err:(21/1579)*---------- > Result: (false) of 'arrayEqualsC' is not equal to expected (true) > Arrays differ starting from [index: 7]: > ... \\u0005, \\u0006, \\u0007, \\u0008, \\u0009, \\n, ... > ... \\u0005, \\u0006, }, \\u0008, \\u0009, \\n, ... > ^^^^^^^ > java.lang.RuntimeException: Result: (false) of 'arrayEqualsC' is not > equal to expected (true) > at > compiler.intrinsics.string.TestStringIntrinsics.invokeAndCheckArrays(TestStringIntrinsics.java:280) > at > ... and so on - E.N. > > testing: open/test/hotspot/jtreg/compiler/intrinsics/string/TestStringIntrinsics.java on linux, windows, macosx. This pull request has now been integrated. Changeset: 52e45a36 Author: Evgeny Nikitin Committer: Igor Ignatyev URL: https://git.openjdk.java.net/jdk/commit/52e45a36 Stats: 1191 lines in 6 files changed: 1177 ins; 1 del; 13 mod 8229186: Improve error messages for TestStringIntrinsics failures Reviewed-by: iignatyev, lmesnik ------------- PR: https://git.openjdk.java.net/jdk/pull/112 From aph at openjdk.java.net Fri Oct 9 17:38:16 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 9 Oct 2020 17:38:16 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v5] In-Reply-To: References: Message-ID: <4NM17B6l4GvNgCbmmQTUcnfZTA6G-IEc85O8jH_q-xA=.63b10da7-bab7-44bc-a4c8-0a675aca45c0@github.com> On Fri, 9 Oct 2020 12:18:58 GMT, Andrew Haley wrote: >> Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains >> six commits: >> - Merge master >> - Add sha3 instructions to cpu/aarch64/aarch64-asmtest.py and regenerate the test in assembler_aarch64.cpp:asm_check >> - Rebase >> - Merge master >> - Fix trailing whitespace issue >> - 8252204: AArch64: Implement SHA3 accelerator/intrinsic >> Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com > > Marked as reviewed by aph (Reviewer). I see Linux x64 failed. However, I don't seem to be able to withdraw my patch approval. However, please consider it withdrawn. ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From kvn at openjdk.java.net Fri Oct 9 17:59:12 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 9 Oct 2020 17:59:12 GMT Subject: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v6] In-Reply-To: References: Message-ID: On Mon, 28 Sep 2020 12:21:01 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy. >> 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes. >> 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop >> copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block >> and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold. >> 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user >> sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy >> sizes above 4096 bytes. JMH Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]() >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8252847 : Review comments resolution Yes, this looks better. Reviewed. Before pushing let me test it. I will let you know results. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/61 From kvn at openjdk.java.net Fri Oct 9 17:59:13 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 9 Oct 2020 17:59:13 GMT Subject: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v5] In-Reply-To: References: Message-ID: On Tue, 6 Oct 2020 18:51:52 GMT, Nils Eliasson wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 1264: >> >>> 1262: } >>> 1263: >>> 1264: #ifndef PRODUCT >> >> macroAssembler_x86.hpp become big. May be we should start thing about splitting arraycopy stubs into separate file. > > But lets do that in a another change. It is good that the AVX3 case is separated out in this change - makes it easy to > follow. agree ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From boris.ulasevich at bell-sw.com Fri Oct 9 19:01:26 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Fri, 9 Oct 2020 22:01:26 +0300 Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: References: Message-ID: Hi Andrew, Many thanks for looking at this again! Benchmark link: [1]. Measurements on Cortex A73, Cortex A53 and Neoverse N1 shows [2] 6-15% performance improvement for bench1 and 18-29% for bench2. thanks, Boris [1] http://cr.openjdk.java.net/~bulasevich/8249893/webrev.02/ConstructFF.java [2] http://cr.openjdk.java.net/~bulasevich/8249893/webrev.02/ConstructFF.txt On Thu, Oct 8, 2020 at 5:07 PM Andrew Haley wrote: > > On 05/10/2020 18:40, Boris Ulasevich wrote: > > Let me revive the change request to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 & > > 0xFF) | ((v2 & 0xFF) << 8)". > > > > Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. > > I looked through the dicussion and I can't find an updated benchmark which > shows the speedup for the cases you now handle. Is there one? > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From kvn at openjdk.java.net Fri Oct 9 20:33:16 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 9 Oct 2020 20:33:16 GMT Subject: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v6] In-Reply-To: References: Message-ID: On Fri, 9 Oct 2020 17:56:51 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8252847 : Review comments resolution > > Yes, this looks better. Reviewed. Before pushing let me test it. I will let you know results. hs-tier1-3 testing passed on x86 (all OSs). ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From kvn at openjdk.java.net Fri Oct 9 23:41:12 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 9 Oct 2020 23:41:12 GMT Subject: RFR: 8253765: C2: Control randomization in StressLCM and StressGCM In-Reply-To: References: Message-ID: <4JvN7jqgQ20BeeU45CQ8yt_S79pbWyT6t-7cv_GOHJA=.beb5effa-d7bf-4031-8ebe-5f6a06e3903d@github.com> On Fri, 9 Oct 2020 08:09:05 GMT, Roberto Casta?eda Lozano wrote: > Use the compilation-local seed in `StressLCM` and `StressGCM` rather than the global one. As a consequence, these > options use by default a fresh seed in every compilation, unless `StressSeed=N` is specified, in which case they behave > deterministically. Annotate tests that use `StressLCM` and `StressGCM` with the `stress` and `randomness` keys to > reflect this change in default behavior. Tested on `tier1` and on all test cases that use `StressLCM` and `StressGCM` > (10 times each). Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/572 From kvn at openjdk.java.net Sat Oct 10 00:15:12 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 10 Oct 2020 00:15:12 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v3] In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: <0_4KDtvn6WhHRCxqupbUQjLauR5DMQWJluogsJ7m_KA=.697e6e15-5c91-4339-abab-1dff51871e8d@github.com> On Mon, 21 Sep 2020 12:45:55 GMT, Jason Tatton wrote: >> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for >> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it >> I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as >> code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and >> java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 >> >> Details of testing: >> ============ >> I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note >> that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the >> input String. Hence the test has been designed to cover all these cases. In summary they are: >> - A ?short? string of < 16 characters. >> - A SIMD String of 16 ? 31 characters. >> - A AVX2 SIMD String of 32 characters+. >> >> Hardware used for testing: >> ----------------------------- >> >> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). >> - AWS Graviton 2 (ARM 64 processor). >> >> I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. >> >> Possible future enhancements: >> ==================== >> For the x86 implementation there may be two further improvements we can make in order to improve performance of both >> the StringUTF16 and StringLatin1 indexOf(char) intrinsics: >> 1. Make use of AVX-512 instructions. >> 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD >> instructions instead of a loop. >> Benchmark results: >> ============ >> **Without** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ? 182.581 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ? 435.933 | ns/op | >> >> >> **With** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ? 407.716 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ? 167.306 | ns/op | >> >> >> The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 >> indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when >> running on ARM. > > Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: > > Add missing newline to end of vmSymbols.cpp Changes seems fine but you missing Copyright + GPL header in new files. test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java line 1: > 1: /* Missing copyright+GPL header in new test. See other tests fro example. test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java line 1: > 1: package org.openjdk.bench.java.lang; Again missing Copyright. ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/71 From kvn at openjdk.java.net Sat Oct 10 00:16:12 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 10 Oct 2020 00:16:12 GMT Subject: RFR: 8253588: C1: assert(false) failed: unknown register on x86_32 only with -XX:+TraceLinearScanLevel=4 In-Reply-To: References: Message-ID: On Fri, 25 Sep 2020 10:07:04 GMT, Christian Hagedorn wrote: > [JDK-8251093](https://bugs.openjdk.java.net/browse/JDK-8251093) introduced some additional logging of intervals and its > registers in various places. On 32-bit only, we could have two registers for an interval. A hi-register is only used > when the interval has `_num_phys_regs` set to 2. In one such place > ([L5448](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5448)), > we log the hi-register `hint_regHi`. On > [L5441](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5441), > however, we can assign it an invalid register number when `_num_phys_regs` is 1. That was not a problem before > JDK-8251093 as we only used `hint_regHi` later after a `_num_phys_regs == 2` check on > [L5484](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5484). > But the additional logging is performed earlier resulting in this assertion failure when trying to log the invalid > `hint_regHi` register. Thanks, Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/356 From kvn at openjdk.java.net Sat Oct 10 01:19:07 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 10 Oct 2020 01:19:07 GMT Subject: RFR: 8251544: CTW: C2 fails with assert(no_dead_loop) failed: dead loop detected In-Reply-To: <8EdEziBkTtIKsise_9EP7uwgBoOfO-9IBAe3UQ3ydsc=.d01f5f9d-f05c-4a4b-b715-6ea8135ed1a3@github.com> References: <8EdEziBkTtIKsise_9EP7uwgBoOfO-9IBAe3UQ3ydsc=.d01f5f9d-f05c-4a4b-b715-6ea8135ed1a3@github.com> Message-ID: On Wed, 30 Sep 2020 07:44:06 GMT, Christian Hagedorn wrote: > In the testcase, we hit a dead data loop while dead nodes are being removed on a dead control path. A region node, lets > say `r`, that represents a loop (has three inputs: self, a loop entry and backedge) but is not a loop node, yet, > becomes dead when its entry control is replaced by top in the first IGVN run after parsing. All its phi nodes also > become dead by replacing the corresponding entry control input by top. The problem is now that some phi nodes of `r` > are processed by IGVN before the corresponding (dead) region node `r`. In `PhiNode::Ideal`, we actually check if there > is a dead loop. But after some of the phis of `r` were already removed, `is_unsafe_data_reference()` on > [L1939](https://github.com/openjdk/jdk/compare/master...chhagedorn:JDK-8251544#diff-efe6b3bde157b833249cd9a8d8b6645bL1939) > returns false. As a result, we do not realize in `PhiNode::Ideal` for one of the remaining phis that it is actually > dead and we apply the normal propagation in `PhiNode::Identity` which replaces the phi by its only non-top input. We > later apply an additional optimization for a `LoadNode` input of an `AddINode` in which we replace the `LoadNode` by > the `AddINode` itself (because of the data loop) and we end up with a dead data loop and fail with the dead loop > assertion. The order in which the nodes are processed in IGVN is crucial. I could only reproduce this bug with a very > specific CTW-like testcase which makes this quite an edge case. I can think of two ways ways how to fix this: 1. Delay > phi nodes which have only one non-top input left and whose region node is not a loop node, has three inputs from which > the entry control is top and the region has not been processed by IGVN, yet. 2. Extend the dead data loop check in > `PhiNode::Ideal()` to already do an unreachable region check as done in `RegionNode::Ideal()`. The result can be cached > as a region should not become reachable anymore once we figured out it is dead. I chose the second approach because I > think it is preferable as we are not delaying IGVN and all other phi nodes can already use the information of a dead > region before it is processed. This avoid any further unwanted optimizations on dead nodes. Thanks, Christian Seems reasonable fix. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/425 From fyang at openjdk.java.net Sat Oct 10 02:56:17 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Sat, 10 Oct 2020 02:56:17 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v6] In-Reply-To: References: Message-ID: > Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com > > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. > Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 > > Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. > For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. > "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. > > We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. > Patch passed jtreg tier1-3 tests with QEMU system emulator. > Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make > sure that there's no regression. > We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java > We measured the performance benefit with an aarch64 cycle-accurate simulator. > Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. > > For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on > real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is > detected. Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge master - Merge master - Add sha3 instructions to cpu/aarch64/aarch64-asmtest.py and regenerate the test in assembler_aarch64.cpp:asm_check - Rebase - Merge master - Fix trailing whitespace issue - 8252204: AArch64: Implement SHA3 accelerator/intrinsic Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com ------------- Changes: https://git.openjdk.java.net/jdk/pull/207/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=207&range=05 Stats: 1512 lines in 35 files changed: 1025 ins; 22 del; 465 mod Patch: https://git.openjdk.java.net/jdk/pull/207.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/207/head:pull/207 PR: https://git.openjdk.java.net/jdk/pull/207 From fyang at openjdk.java.net Sat Oct 10 06:16:17 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Sat, 10 Oct 2020 06:16:17 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v7] In-Reply-To: References: Message-ID: > Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com > > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. > Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 > > Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. > For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. > "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. > > We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. > Patch passed jtreg tier1-3 tests with QEMU system emulator. > Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make > sure that there's no regression. > We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java > We measured the performance benefit with an aarch64 cycle-accurate simulator. > Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. > > For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on > real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is > detected. Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge master - Merge master - Merge master - Add sha3 instructions to cpu/aarch64/aarch64-asmtest.py and regenerate the test in assembler_aarch64.cpp:asm_check - Rebase - Merge master - Fix trailing whitespace issue - 8252204: AArch64: Implement SHA3 accelerator/intrinsic Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com ------------- Changes: https://git.openjdk.java.net/jdk/pull/207/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=207&range=06 Stats: 1512 lines in 35 files changed: 1025 ins; 22 del; 465 mod Patch: https://git.openjdk.java.net/jdk/pull/207.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/207/head:pull/207 PR: https://git.openjdk.java.net/jdk/pull/207 From jbhateja at openjdk.java.net Sat Oct 10 06:32:12 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sat, 10 Oct 2020 06:32:12 GMT Subject: Integrated: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions In-Reply-To: References: Message-ID: <_mGWUOZH852Ad58rrDEF6citxojOAY1Q-NU0BOWgVME=.2fcb9893-8f93-4f3b-a21b-f2f98069a971@github.com> On Mon, 7 Sep 2020 14:28:18 GMT, Jatin Bhateja wrote: > Summary: > > 1) New AVX3 optimized stubs for both conjoint and disjoint arraycopy. > 2) Special instruction sequence blocks for copy sizes b/w 32-192 bytes. > 3) Block copy operation above 192 bytes is performed using destination address aligned PRE-MAIN-POST loop. Main loop > copies 192 byte in one iteration and tail part fall over special instruction sequence blocks. 4) Both small copy block > and aligned loop use 32 byte vector register to prevent and frequency penalty for copy sizes less than AVX3Threshold. > 5) For block size above AVX3Theshold both special blocks and loop operate using 64 byte register. 6) In case user > sets the maximum vector size to 32 bytes, forward copy (disjoint) operations are done using efficient REP MOVS for copy > sizes above 4096 bytes. JMH Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_Baseline.txt]() > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() This pull request has now been integrated. Changeset: 4b5ac3ab Author: Jatin Bhateja URL: https://git.openjdk.java.net/jdk/commit/4b5ac3ab Stats: 1517 lines in 11 files changed: 1419 ins; 69 del; 29 mod 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/61 From fyang at openjdk.java.net Sat Oct 10 13:09:11 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Sat, 10 Oct 2020 13:09:11 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v5] In-Reply-To: <4NM17B6l4GvNgCbmmQTUcnfZTA6G-IEc85O8jH_q-xA=.63b10da7-bab7-44bc-a4c8-0a675aca45c0@github.com> References: <4NM17B6l4GvNgCbmmQTUcnfZTA6G-IEc85O8jH_q-xA=.63b10da7-bab7-44bc-a4c8-0a675aca45c0@github.com> Message-ID: <7CXYOoHPTvfS6YvwFjdlO27rQKRbDu3_QSGP7vDuyDs=.41789630-e05f-4f8a-8562-ad8bb74e12aa@github.com> On Fri, 9 Oct 2020 17:35:22 GMT, Andrew Haley wrote: > I see Linux x64 failed. However, I don't seem to be able to withdraw my patch approval. > However, please consider it withdrawn. Thanks for approving this patch. I checked the error messages and I think the failures were not caused by this patch. The failures has been fixed by the following two commits: commit ec41046c5ce7077eebf4a3c265f79c7fba33d916 8254348: Build fails when cds is disabled after JDK-8247536 commit aaa0a2a04792d7c84150e9d972790978ffcc6890 8254297: Zero and Minimal VMs are broken with undeclared identifier 'DerivedPointerTable' after JDK-8253180 The testing was triggered again automatically after I merge master and I see it passed now. Do you have any comments for the discussion here? https://github.com/openjdk/jdk/pull/207#issuecomment-701243662 Valerie Peng has checked the java security changes, i.e. src/java.base/share/classes/sun/security/provider/SHA3.java. Do you think we need another reviewer for this patch? ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From jiefu at openjdk.java.net Sat Oct 10 13:14:15 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 10 Oct 2020 13:14:15 GMT Subject: RFR: 8254351: Minimal VM build fails with undeclared identifier 'MaxVectorSize' after JDK-8252847 Message-ID: <1lHQQwe0kmdKfzjQ93ab4Q7ZMYBLxYluFnpkKjOrtdo=.fd176565-637d-467c-bb28-6ca2812881ee@github.com> To fix the bug, #if COMPILER2_OR_JVMCI was added. Please review it. Thanks. Testing: - Minimal and server-fastdebug builds on x86 ------------- Commit messages: - 8254351: Minimal VM build fails with undeclared identifier 'MaxVectorSize' after JDK-8252847 Changes: https://git.openjdk.java.net/jdk/pull/588/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=588&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254351 Stats: 19 lines in 2 files changed: 18 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/588.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/588/head:pull/588 PR: https://git.openjdk.java.net/jdk/pull/588 From jiefu at openjdk.java.net Sat Oct 10 13:14:15 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 10 Oct 2020 13:14:15 GMT Subject: RFR: 8254351: Minimal VM build fails with undeclared identifier 'MaxVectorSize' after JDK-8252847 In-Reply-To: <1lHQQwe0kmdKfzjQ93ab4Q7ZMYBLxYluFnpkKjOrtdo=.fd176565-637d-467c-bb28-6ca2812881ee@github.com> References: <1lHQQwe0kmdKfzjQ93ab4Q7ZMYBLxYluFnpkKjOrtdo=.fd176565-637d-467c-bb28-6ca2812881ee@github.com> Message-ID: On Sat, 10 Oct 2020 13:09:20 GMT, Jie Fu wrote: > To fix the bug, #if COMPILER2_OR_JVMCI was added. > Please review it. Thanks. > > Testing: > - Minimal and server-fastdebug builds on x86 ------------- PR: https://git.openjdk.java.net/jdk/pull/588 From kcr at openjdk.java.net Sat Oct 10 13:19:09 2020 From: kcr at openjdk.java.net (Kevin Rushforth) Date: Sat, 10 Oct 2020 13:19:09 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v5] In-Reply-To: <4NM17B6l4GvNgCbmmQTUcnfZTA6G-IEc85O8jH_q-xA=.63b10da7-bab7-44bc-a4c8-0a675aca45c0@github.com> References: <4NM17B6l4GvNgCbmmQTUcnfZTA6G-IEc85O8jH_q-xA=.63b10da7-bab7-44bc-a4c8-0a675aca45c0@github.com> Message-ID: On Fri, 9 Oct 2020 17:35:22 GMT, Andrew Haley wrote: >> Marked as reviewed by aph (Reviewer). > > I see Linux x64 failed. However, I don't seem to be able to withdraw my patch approval. > However, please consider it withdrawn. @theRealAph if you still need to, you can withdraw your approval by reviewing it again and selecting "Request changes". ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From jiefu at openjdk.java.net Sat Oct 10 14:23:19 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 10 Oct 2020 14:23:19 GMT Subject: RFR: 8254351: Minimal VM build fails with undeclared identifier 'MaxVectorSize' after JDK-8252847 [v2] In-Reply-To: <1lHQQwe0kmdKfzjQ93ab4Q7ZMYBLxYluFnpkKjOrtdo=.fd176565-637d-467c-bb28-6ca2812881ee@github.com> References: <1lHQQwe0kmdKfzjQ93ab4Q7ZMYBLxYluFnpkKjOrtdo=.fd176565-637d-467c-bb28-6ca2812881ee@github.com> Message-ID: > To fix the bug, #if COMPILER2_OR_JVMCI was added. > Please review it. Thanks. > > Testing: > - Minimal and server-fastdebug builds on x86 Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Fix minimal-debug build failure ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/588/files - new: https://git.openjdk.java.net/jdk/pull/588/files/6aa5f9f1..4bb6e2a0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=588&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=588&range=00-01 Stats: 29 lines in 2 files changed: 16 ins; 12 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/588.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/588/head:pull/588 PR: https://git.openjdk.java.net/jdk/pull/588 From xliu at openjdk.java.net Sat Oct 10 14:59:13 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 10 Oct 2020 14:59:13 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs Message-ID: 8254269: simplify Node::disconnect_inputs ------------- Commit messages: - 8254269: simplify Node::disconnect_inputs Changes: https://git.openjdk.java.net/jdk/pull/589/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=589&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254269 Stats: 49 lines in 11 files changed: 1 ins; 9 del; 39 mod Patch: https://git.openjdk.java.net/jdk/pull/589.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/589/head:pull/589 PR: https://git.openjdk.java.net/jdk/pull/589 From redestad at openjdk.java.net Sat Oct 10 15:18:09 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sat, 10 Oct 2020 15:18:09 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs In-Reply-To: References: Message-ID: <5a2T8EHC6civgnwqGIPzSXxzzC1qanfEVP4hdZRst9A=.c326734c-a9f4-4c9f-925c-e8425b677f8c@github.com> On Sat, 10 Oct 2020 14:53:23 GMT, Xin Liu wrote: > 8254269: simplify Node::disconnect_inputs Nice cleanup! A few suggestions inline, but looks good as-is, too. src/hotspot/share/opto/node.cpp line 901: > 899: > 900: for( uint i = 0; i < cnt; ++i ) { > 901: if( in(i) == nullptr ) continue; Perhaps a matter of preference, but this could now be simplified to: if (in(i) != nullptr) { set_req(i, nullptr); } src/hotspot/share/opto/node.cpp line 909: > 907: uint max = len(); > 908: for( uint i = 0; i < max; ++i ) { > 909: if( in(i) == nullptr ) continue; Same here ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/589 From kvn at openjdk.java.net Sun Oct 11 00:37:08 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 11 Oct 2020 00:37:08 GMT Subject: RFR: 8254351: Minimal VM build fails with undeclared identifier 'MaxVectorSize' after JDK-8252847 [v2] In-Reply-To: References: <1lHQQwe0kmdKfzjQ93ab4Q7ZMYBLxYluFnpkKjOrtdo=.fd176565-637d-467c-bb28-6ca2812881ee@github.com> Message-ID: <4_CzBvVuO2GUeLxnXmREkBuOBLf-1qew7OCQOsf6fqE=.d510e631-4095-4cf3-a9bd-d6b56de14271@github.com> On Sat, 10 Oct 2020 14:23:19 GMT, Jie Fu wrote: >> To fix the bug, #if COMPILER2_OR_JVMCI was added. >> Please review it. Thanks. >> >> Testing: >> - Minimal and server-fastdebug builds on x86 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Fix minimal-debug build failure Fix is good and trivial. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/588 From jiefu at openjdk.java.net Sun Oct 11 00:44:09 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 11 Oct 2020 00:44:09 GMT Subject: RFR: 8254351: Minimal VM build fails with undeclared identifier 'MaxVectorSize' after JDK-8252847 [v2] In-Reply-To: <4_CzBvVuO2GUeLxnXmREkBuOBLf-1qew7OCQOsf6fqE=.d510e631-4095-4cf3-a9bd-d6b56de14271@github.com> References: <1lHQQwe0kmdKfzjQ93ab4Q7ZMYBLxYluFnpkKjOrtdo=.fd176565-637d-467c-bb28-6ca2812881ee@github.com> <4_CzBvVuO2GUeLxnXmREkBuOBLf-1qew7OCQOsf6fqE=.d510e631-4095-4cf3-a9bd-d6b56de14271@github.com> Message-ID: On Sun, 11 Oct 2020 00:33:57 GMT, Vladimir Kozlov wrote: >> Jie Fu has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix minimal-debug build failure > > Fix is good and trivial. Thanks @vnkozlov for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/588 From jiefu at openjdk.java.net Sun Oct 11 00:44:10 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 11 Oct 2020 00:44:10 GMT Subject: Integrated: 8254351: Minimal VM build fails with undeclared identifier 'MaxVectorSize' after JDK-8252847 In-Reply-To: <1lHQQwe0kmdKfzjQ93ab4Q7ZMYBLxYluFnpkKjOrtdo=.fd176565-637d-467c-bb28-6ca2812881ee@github.com> References: <1lHQQwe0kmdKfzjQ93ab4Q7ZMYBLxYluFnpkKjOrtdo=.fd176565-637d-467c-bb28-6ca2812881ee@github.com> Message-ID: On Sat, 10 Oct 2020 13:09:20 GMT, Jie Fu wrote: > To fix the bug, #if COMPILER2_OR_JVMCI was added. > Please review it. Thanks. > > Testing: > - Minimal and server-fastdebug builds on x86 This pull request has now been integrated. Changeset: d43f1416 Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/d43f1416 Stats: 46 lines in 3 files changed: 34 ins; 12 del; 0 mod 8254351: Minimal VM build fails with undeclared identifier 'MaxVectorSize' after JDK-8252847 Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/588 From xliu at openjdk.java.net Sun Oct 11 08:08:18 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sun, 11 Oct 2020 08:08:18 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v2] In-Reply-To: References: Message-ID: > 8254269: simplify Node::disconnect_inputs Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8254269: simplify Node::disconnect_inputs Optimize it for the precedence loop. because there's no null between 2 non-null precedences, disconnect_inputs can break at a null value. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/589/files - new: https://git.openjdk.java.net/jdk/pull/589/files/4a9d8030..b8a72755 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=589&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=589&range=00-01 Stats: 17 lines in 1 file changed: 7 ins; 3 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/589.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/589/head:pull/589 PR: https://git.openjdk.java.net/jdk/pull/589 From aph at redhat.com Sun Oct 11 13:00:04 2020 From: aph at redhat.com (Andrew Haley) Date: Sun, 11 Oct 2020 14:00:04 +0100 Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v5] In-Reply-To: <7CXYOoHPTvfS6YvwFjdlO27rQKRbDu3_QSGP7vDuyDs=.41789630-e05f-4f8a-8562-ad8bb74e12aa@github.com> References: <4NM17B6l4GvNgCbmmQTUcnfZTA6G-IEc85O8jH_q-xA=.63b10da7-bab7-44bc-a4c8-0a675aca45c0@github.com> <7CXYOoHPTvfS6YvwFjdlO27rQKRbDu3_QSGP7vDuyDs=.41789630-e05f-4f8a-8562-ad8bb74e12aa@github.com> Message-ID: <22d2eac0-a043-152d-5308-899b8ae7bfe6@redhat.com> On 10/10/2020 14:09, Fei Yang wrote: > On Fri, 9 Oct 2020 17:35:22 GMT, Andrew Haley wrote: > >> I see Linux x64 failed. However, I don't seem to be able to withdraw my patch approval. >> However, please consider it withdrawn. > > Thanks for approving this patch. > I checked the error messages and I think the failures were not caused by this patch. > The failures has been fixed by the following two commits: > commit ec41046c5ce7077eebf4a3c265f79c7fba33d916 > 8254348: Build fails when cds is disabled after JDK-8247536 > commit aaa0a2a04792d7c84150e9d972790978ffcc6890 > 8254297: Zero and Minimal VMs are broken with undeclared identifier 'DerivedPointerTable' after JDK-8253180 > > The testing was triggered again automatically after I merge master and I see it passed now. OK, so we're good. > Do you have any comments for the discussion here? > https://github.com/openjdk/jdk/pull/207#issuecomment-701243662 > > Valerie Peng has checked the java security changes, i.e. src/java.base/share/classes/sun/security/provider/SHA3.java. > Do you think we need another reviewer for this patch? If we've had a security engineer look at the shared code, we're good to go. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From redestad at openjdk.java.net Sun Oct 11 13:10:08 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sun, 11 Oct 2020 13:10:08 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v2] In-Reply-To: <5a2T8EHC6civgnwqGIPzSXxzzC1qanfEVP4hdZRst9A=.c326734c-a9f4-4c9f-925c-e8425b677f8c@github.com> References: <5a2T8EHC6civgnwqGIPzSXxzzC1qanfEVP4hdZRst9A=.c326734c-a9f4-4c9f-925c-e8425b677f8c@github.com> Message-ID: On Sat, 10 Oct 2020 15:15:05 GMT, Claes Redestad wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8254269: simplify Node::disconnect_inputs >> >> Optimize it for the precedence loop. because there's no null between >> 2 non-null precedences, disconnect_inputs can break at a null value. > > Nice cleanup! A few suggestions inline, but looks good as-is, too. I think the changes in b8a7275 might be fine, but they are more subtle than the cleanup in the preceding version and I'm not comfortable reviewing them. I think it needs a more thorough examination of if those optimizations always hold. Could probably use some more asserts, a few tests and lift the documented assumption of the structure of the input to a more visible place. My suggestion is to push this ahead with the straightforward cleanup you had in 4a9d803 and file a follow-up RFE for the other optimization. ------------- PR: https://git.openjdk.java.net/jdk/pull/589 From aph at openjdk.java.net Sun Oct 11 14:30:09 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Sun, 11 Oct 2020 14:30:09 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: References: Message-ID: <5n3SJE02oD_SW_psT84VEJh22lomGJfJtARdyjf0Kcw=.acff1dc7-3dbd-4c8d-8889-f434570e6da2@github.com> On Mon, 5 Oct 2020 17:36:14 GMT, Boris Ulasevich wrote: > Let me revive the change request [3] to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 > & 0xFF) | ((v2 & 0xFF) << 8)". > > Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. > > As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new > BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI > instruction is emitted for the new node. > > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html > [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039653.html > [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html Please add your microbenchmark to this pull request. ------------- PR: https://git.openjdk.java.net/jdk/pull/511 From aph at redhat.com Sun Oct 11 15:04:03 2020 From: aph at redhat.com (Andrew Haley) Date: Sun, 11 Oct 2020 16:04:03 +0100 Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: References: Message-ID: <24c0915e-581e-5c0c-5ffb-c7c2789f7e58@redhat.com> Hi, On 09/10/2020 20:01, Boris Ulasevich wrote: > > Many thanks for looking at this again! > > Benchmark link: [1]. Measurements on Cortex A73, Cortex A53 and Neoverse N1 > shows [2] 6-15% performance improvement for bench1 and 18-29% for bench2. For me on ThunderX 2, Before: Benchmark Mode Cnt Score Error Units ConstructFF.bench1 avgt 10 15.170 ? 0.975 ns/op ConstructFF.bench4 avgt 10 39.391 ? 2.617 ns/op After: ConstructFF.bench1 avgt 10 12.349 ? 2.535 ns/op ConstructFF.bench4 avgt 10 24.353 ? 0.443 ns/op So for this carefully-constructed benchmark, it looks like there's a useful gain. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From kvn at openjdk.java.net Sun Oct 11 15:06:18 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 11 Oct 2020 15:06:18 GMT Subject: RFR: 8254352: 3 compiler tests failed with "assert(allocates2(pc)) failed" Message-ID: <0YJBML9WPoAwBeUfmZtMGsByzCNbIFoXN2t5IXC0GY0=.0663bbcb-987a-4c82-b6e6-6635922b2882@github.com> JDK-8252847 changes added new AVX3 specific stubs for arraycopy and increased code buffer size but it was not enough. On Windows we need 3500 bytes more because additional code is used to preserve registers. Tested hs tier1-3 and ran bug's tests on machine where they failed. ------------- Commit messages: - 8254352: 3 compiler tests failed with "assert(allocates2(pc)) failed: not in CodeBuffer memory" Changes: https://git.openjdk.java.net/jdk/pull/592/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=592&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254352 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/592.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/592/head:pull/592 PR: https://git.openjdk.java.net/jdk/pull/592 From aph at openjdk.java.net Sun Oct 11 15:10:08 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Sun, 11 Oct 2020 15:10:08 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: <5n3SJE02oD_SW_psT84VEJh22lomGJfJtARdyjf0Kcw=.acff1dc7-3dbd-4c8d-8889-f434570e6da2@github.com> References: <5n3SJE02oD_SW_psT84VEJh22lomGJfJtARdyjf0Kcw=.acff1dc7-3dbd-4c8d-8889-f434570e6da2@github.com> Message-ID: On Sun, 11 Oct 2020 14:27:47 GMT, Andrew Haley wrote: >> Let me revive the change request [3] to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 >> & 0xFF) | ((v2 & 0xFF) << 8)". >> >> Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. >> >> As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new >> BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI >> instruction is emitted for the new node. >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039653.html >> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html > > Please add your microbenchmark to this pull request. One other thing: AFAICS this change doesn't work with Add as well as Or. Looking at the logic in addnode.cpp I don't think there's any reason for not doing that, is there? I've often seen ((a & 0xff) << 8) + (b & 0xff). Or is more common than And, I think, bit we want to get as much leverage out of this work as we can. ------------- PR: https://git.openjdk.java.net/jdk/pull/511 From shade at openjdk.java.net Sun Oct 11 17:52:08 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sun, 11 Oct 2020 17:52:08 GMT Subject: RFR: 8254352: 3 compiler tests failed with "assert(allocates2(pc)) failed" In-Reply-To: <0YJBML9WPoAwBeUfmZtMGsByzCNbIFoXN2t5IXC0GY0=.0663bbcb-987a-4c82-b6e6-6635922b2882@github.com> References: <0YJBML9WPoAwBeUfmZtMGsByzCNbIFoXN2t5IXC0GY0=.0663bbcb-987a-4c82-b6e6-6635922b2882@github.com> Message-ID: On Sun, 11 Oct 2020 14:59:07 GMT, Vladimir Kozlov wrote: > JDK-8252847 changes added new AVX3 specific stubs for arraycopy and increased code buffer size but it was not enough. > On Windows we need 3500 bytes more because additional code is used to preserve registers. > > Tested hs tier1-3 and ran bug's tests on machine where they failed. Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/592 From shade at openjdk.java.net Sun Oct 11 17:57:15 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sun, 11 Oct 2020 17:57:15 GMT Subject: RFR: 8254362: x86_32 builds fail after JDK-8253180 Message-ID: <5XhQh464_8PL9H2BVJWnPHUpEzJihSrThSyuuW4Syto=.ec530f60-a3d8-4904-92dd-f70c3491b208@github.com> `r15_thread` is not available on x86_32. I noticed that JDK-8253180 introduces: void C1SafepointPollStub::emit_code(LIR_Assembler* ce) { #ifdef _LP64 ... __ movptr(Address(r15_thread, JavaThread::saved_exception_pc_offset()), rscratch1); ... #else ShouldNotReachHere(); #endif /* _LP64 */ } ...and we should do the same in `C2SafepointPollStubTable` to unbreak x86_32. Testing: - [x] x86_32 build - [x] x86_32 hotspot:tier1 (lots of unrelated failures that look like JDK-8254125, never a `ShouldNotReachHere`) - [x] x86_64 build - [x] x86_64 hotspot:tier1 ------------- Commit messages: - 8254362: x86_32 builds fail after JDK-8253180 Changes: https://git.openjdk.java.net/jdk/pull/593/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=593&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254362 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/593.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/593/head:pull/593 PR: https://git.openjdk.java.net/jdk/pull/593 From shade at openjdk.java.net Sun Oct 11 18:00:08 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sun, 11 Oct 2020 18:00:08 GMT Subject: RFR: 8254352: 3 compiler tests failed with "assert(allocates2(pc)) failed" In-Reply-To: References: <0YJBML9WPoAwBeUfmZtMGsByzCNbIFoXN2t5IXC0GY0=.0663bbcb-987a-4c82-b6e6-6635922b2882@github.com> Message-ID: <53qWSWT-wCQrAQ5E11mIOXES2WP1W-MXkzp1ZkVeRR8=.15ce68fb-784a-4947-b22f-d8616b3970f3@github.com> On Sun, 11 Oct 2020 17:49:20 GMT, Aleksey Shipilev wrote: >> JDK-8252847 changes added new AVX3 specific stubs for arraycopy and increased code buffer size but it was not enough. >> On Windows we need 3500 bytes more because additional code is used to preserve registers. >> >> Tested hs tier1-3 and ran bug's tests on machine where they failed. > > Marked as reviewed by shade (Reviewer). Note: you would not be able to integrate until PR and issue title match, see "Integration Blocker" notice in the PR body. ------------- PR: https://git.openjdk.java.net/jdk/pull/592 From kvn at openjdk.java.net Sun Oct 11 19:41:10 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 11 Oct 2020 19:41:10 GMT Subject: Integrated: 8254352: 3 compiler tests failed with "assert(allocates2(pc)) failed: not in CodeBuffer memory" In-Reply-To: <0YJBML9WPoAwBeUfmZtMGsByzCNbIFoXN2t5IXC0GY0=.0663bbcb-987a-4c82-b6e6-6635922b2882@github.com> References: <0YJBML9WPoAwBeUfmZtMGsByzCNbIFoXN2t5IXC0GY0=.0663bbcb-987a-4c82-b6e6-6635922b2882@github.com> Message-ID: On Sun, 11 Oct 2020 14:59:07 GMT, Vladimir Kozlov wrote: > JDK-8252847 changes added new AVX3 specific stubs for arraycopy and increased code buffer size but it was not enough. > On Windows we need 3500 bytes more because additional code is used to preserve registers. > > Tested hs tier1-3 and ran bug's tests on machine where they failed. This pull request has now been integrated. Changeset: 25001c50 Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/25001c50 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8254352: 3 compiler tests failed with "assert(allocates2(pc)) failed: not in CodeBuffer memory" Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/592 From kvn at openjdk.java.net Sun Oct 11 19:47:12 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 11 Oct 2020 19:47:12 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v2] In-Reply-To: <5a2T8EHC6civgnwqGIPzSXxzzC1qanfEVP4hdZRst9A=.c326734c-a9f4-4c9f-925c-e8425b677f8c@github.com> References: <5a2T8EHC6civgnwqGIPzSXxzzC1qanfEVP4hdZRst9A=.c326734c-a9f4-4c9f-925c-e8425b677f8c@github.com> Message-ID: On Sat, 10 Oct 2020 15:12:09 GMT, Claes Redestad wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8254269: simplify Node::disconnect_inputs >> >> Optimize it for the precedence loop. because there's no null between >> 2 non-null precedences, disconnect_inputs can break at a null value. > > src/hotspot/share/opto/node.cpp line 901: > >> 899: >> 900: for( uint i = 0; i < cnt; ++i ) { >> 901: if( in(i) == nullptr ) continue; > > Perhaps a matter of preference, but this could now be simplified to: > if (in(i) != nullptr) { > set_req(i, nullptr); > } Agree with suggestion. It is HS code style requirements. ------------- PR: https://git.openjdk.java.net/jdk/pull/589 From kvn at openjdk.java.net Sun Oct 11 20:10:09 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 11 Oct 2020 20:10:09 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v2] In-Reply-To: References: Message-ID: On Sun, 11 Oct 2020 08:08:18 GMT, Xin Liu wrote: >> 8254269: simplify Node::disconnect_inputs > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8254269: simplify Node::disconnect_inputs > > Optimize it for the precedence loop. because there's no null between > 2 non-null precedences, disconnect_inputs can break at a null value. Changes requested by kvn (Reviewer). src/hotspot/share/opto/node.cpp line 910: > 908: set_req(i, nullptr); > 909: } > 910: As Claes suggested: for (uint i = 0; i < req(); ++i) { if (in(i) != nullptr) { set_req(i, nullptr); } } src/hotspot/share/opto/node.cpp line 914: > 912: // Note: Safepoints may have precedence edges, even during parsing > 913: // Precedences have no embedded NULL > 914: for (; i < len() && in(i) != nullptr; ++i) { I don't think it is correct. src/hotspot/share/opto/node.cpp line 916: > 914: for (; i < len() && in(i) != nullptr; ++i) { > 915: set_prec(i, nullptr); > 916: } Something like next: for (uint i = req(); i < len(); ++i) { if (in(i) != nullptr) { set_prec(i, nullptr); } } ------------- PR: https://git.openjdk.java.net/jdk/pull/589 From kvn at openjdk.java.net Sun Oct 11 20:28:08 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 11 Oct 2020 20:28:08 GMT Subject: RFR: 8254362: x86_32 builds fail after JDK-8253180 In-Reply-To: <5XhQh464_8PL9H2BVJWnPHUpEzJihSrThSyuuW4Syto=.ec530f60-a3d8-4904-92dd-f70c3491b208@github.com> References: <5XhQh464_8PL9H2BVJWnPHUpEzJihSrThSyuuW4Syto=.ec530f60-a3d8-4904-92dd-f70c3491b208@github.com> Message-ID: On Sun, 11 Oct 2020 17:52:05 GMT, Aleksey Shipilev wrote: > `r15_thread` is not available on x86_32. I noticed that JDK-8253180 introduces: > > void C1SafepointPollStub::emit_code(LIR_Assembler* ce) { > #ifdef _LP64 > ... > __ movptr(Address(r15_thread, JavaThread::saved_exception_pc_offset()), rscratch1); > ... > #else > ShouldNotReachHere(); > #endif /* _LP64 */ > } > > ...and we should do the same in `C2SafepointPollStubTable` to unbreak x86_32. > > Testing: > - [x] x86_32 build > - [x] x86_32 hotspot:tier1 (lots of unrelated failures that look like JDK-8254125, never a `ShouldNotReachHere`) > - [x] x86_64 build > - [x] x86_64 hotspot:tier1 Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/593 From xliu at openjdk.java.net Sun Oct 11 20:57:09 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sun, 11 Oct 2020 20:57:09 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v2] In-Reply-To: References: Message-ID: On Sun, 11 Oct 2020 20:03:58 GMT, Vladimir Kozlov wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8254269: simplify Node::disconnect_inputs >> >> Optimize it for the precedence loop. because there's no null between >> 2 non-null precedences, disconnect_inputs can break at a null value. > > src/hotspot/share/opto/node.cpp line 914: > >> 912: // Note: Safepoints may have precedence edges, even during parsing >> 913: // Precedences have no embedded NULL >> 914: for (; i < len() && in(i) != nullptr; ++i) { > > I don't think it is correct. hi, @cl4es @vnkozlov, Thank you to review this patch. The reason I changed to this because I read this comment yesterday. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/node.hpp#L280 It suggests that `Node::_in` has an inherent order, which I depict here: https://github.com/navyxliu/jdk/commit/b8a72755a29d2fcbe36c9dcaf7f696634f813b4f#diff-e1c5a771a82d5fdb7e88c5b90b444736R898 If C2 sees a nullptr in precedence section, it can assume the rest of precedences are all nullptr. I have 2 evidence to prove it holds. 1. rm_prec() does relocate null value to the end. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/node.cpp#L1025 2. I added an assert after the precedence loop and verified it using hotspot-tier1. `for(; i References: Message-ID: On Sun, 11 Oct 2020 20:06:58 GMT, Vladimir Kozlov wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8254269: simplify Node::disconnect_inputs >> >> Optimize it for the precedence loop. because there's no null between >> 2 non-null precedences, disconnect_inputs can break at a null value. > > Changes requested by kvn (Reviewer). > I think the changes in [b8a7275](https://github.com/openjdk/jdk/commit/b8a72755a29d2fcbe36c9dcaf7f696634f813b4f) might > be fine, but they are more subtle than the cleanup in the preceding version and I'm not comfortable reviewing them. I > think it needs a more thorough examination of if those optimizations always hold. Could probably use some more asserts, > a few tests and lift the documented assumption of the structure of the input to a more visible place. My suggestion is > to push this ahead with the straightforward cleanup you had in > [4a9d803](https://github.com/openjdk/jdk/commit/4a9d80306960a1c8da2f95f8578006249acb0a6c) and file a follow-up RFE for > the other optimization. yes, thank for the advice. ------------- PR: https://git.openjdk.java.net/jdk/pull/589 From shade at openjdk.java.net Sun Oct 11 21:11:09 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sun, 11 Oct 2020 21:11:09 GMT Subject: Integrated: 8254362: x86_32 builds fail after JDK-8253180 In-Reply-To: <5XhQh464_8PL9H2BVJWnPHUpEzJihSrThSyuuW4Syto=.ec530f60-a3d8-4904-92dd-f70c3491b208@github.com> References: <5XhQh464_8PL9H2BVJWnPHUpEzJihSrThSyuuW4Syto=.ec530f60-a3d8-4904-92dd-f70c3491b208@github.com> Message-ID: <2rRLEGH-HsqX-dFGShGqQHgHwjjWB88voDN4VmukNpA=.00a928b3-c922-4673-ae9f-cbcfc9296c3f@github.com> On Sun, 11 Oct 2020 17:52:05 GMT, Aleksey Shipilev wrote: > `r15_thread` is not available on x86_32. I noticed that JDK-8253180 introduces: > > void C1SafepointPollStub::emit_code(LIR_Assembler* ce) { > #ifdef _LP64 > ... > __ movptr(Address(r15_thread, JavaThread::saved_exception_pc_offset()), rscratch1); > ... > #else > ShouldNotReachHere(); > #endif /* _LP64 */ > } > > ...and we should do the same in `C2SafepointPollStubTable` to unbreak x86_32. > > Testing: > - [x] x86_32 build > - [x] x86_32 hotspot:tier1 (lots of unrelated failures that look like JDK-8254125, never a `ShouldNotReachHere`) > - [x] x86_64 build > - [x] x86_64 hotspot:tier1 This pull request has now been integrated. Changeset: d3069ac9 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/d3069ac9 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8254362: x86_32 builds fail after JDK-8253180 Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/593 From kvn at openjdk.java.net Sun Oct 11 21:52:10 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 11 Oct 2020 21:52:10 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v2] In-Reply-To: References: Message-ID: On Sun, 11 Oct 2020 08:08:18 GMT, Xin Liu wrote: >> 8254269: simplify Node::disconnect_inputs > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8254269: simplify Node::disconnect_inputs > > Optimize it for the precedence loop. because there's no null between > 2 non-null precedences, disconnect_inputs can break at a null value. Changes requested by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/589 From kvn at openjdk.java.net Sun Oct 11 21:52:11 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 11 Oct 2020 21:52:11 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v2] In-Reply-To: References: Message-ID: <7WDpg6dY9upaoHd_9RsDQwqqXEGbldzzjpHkFPMrebM=.182de6f9-9306-465d-8e28-74d3796e5a48@github.com> On Sun, 11 Oct 2020 20:54:00 GMT, Xin Liu wrote: >> src/hotspot/share/opto/node.cpp line 914: >> >>> 912: // Note: Safepoints may have precedence edges, even during parsing >>> 913: // Precedences have no embedded NULL >>> 914: for (; i < len() && in(i) != nullptr; ++i) { >> >> I don't think it is correct. > > hi, @cl4es @vnkozlov, > Thank you to review this patch. > > The reason I changed to this because I read this comment yesterday. > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/node.hpp#L280 > It suggests that `Node::_in` has an inherent order, which I depict here: > https://github.com/navyxliu/jdk/commit/b8a72755a29d2fcbe36c9dcaf7f696634f813b4f#diff-e1c5a771a82d5fdb7e88c5b90b444736R898 > > If C2 sees a nullptr in precedence section, it can assume the rest of precedences are all nullptr. > I have 2 evidence to prove it holds. > 1. rm_prec() does relocate null value to the end. > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/node.cpp#L1025 2. I added an assert after the > precedence loop and verified it using hotspot-tier1. `for(; i ` > > With this assumption, c2 can leave loop a little bit early. > > How about this? I withdraw this change and just make a pure clean-up. > I admit that the potential gain is very very small. it's not worth it. You are right. There is already existing assert in find_prec_edge() called from set_prec(): https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/node.hpp#L436 I still want to see explicit start from req(): for (uint i = req(); (i < len()) && (in(i) != nullptr); ++i) { set_prec(i, nullptr); } ------------- PR: https://git.openjdk.java.net/jdk/pull/589 From xliu at openjdk.java.net Mon Oct 12 00:55:10 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 12 Oct 2020 00:55:10 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v2] In-Reply-To: References: Message-ID: On Sun, 11 Oct 2020 20:06:49 GMT, Vladimir Kozlov wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8254269: simplify Node::disconnect_inputs >> >> Optimize it for the precedence loop. because there's no null between >> 2 non-null precedences, disconnect_inputs can break at a null value. > > src/hotspot/share/opto/node.cpp line 916: > >> 914: for (; i < len() && in(i) != nullptr; ++i) { >> 915: set_prec(i, nullptr); >> 916: } > > Something like next: > for (uint i = req(); i < len(); ++i) { > if (in(i) != nullptr) { > set_prec(i, nullptr); > } > } This is trickier than I thought. `set_prec(i, nullptr)` will refill _in[i] with the last non-null value, or NULL. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/node.cpp#L1030 It means iteration from req() to len() skip a node if there are more than one non-null precedences. ------------- PR: https://git.openjdk.java.net/jdk/pull/589 From xliu at openjdk.java.net Mon Oct 12 01:14:21 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 12 Oct 2020 01:14:21 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v3] In-Reply-To: References: Message-ID: <3SliY5Au71QxorULw7ntHhcGIIG0LA8B7wy08ekfSWI=.65cf0bb7-3290-4e6a-97fe-fd0dacb075b3@github.com> > 8254269: simplify Node::disconnect_inputs Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8254269: simplify Node::disconnect_inputs ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/589/files - new: https://git.openjdk.java.net/jdk/pull/589/files/b8a72755..5c6bb10a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=589&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=589&range=01-02 Stats: 13 lines in 1 file changed: 0 ins; 8 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/589.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/589/head:pull/589 PR: https://git.openjdk.java.net/jdk/pull/589 From darcy at openjdk.java.net Mon Oct 12 03:00:14 2020 From: darcy at openjdk.java.net (Joe Darcy) Date: Mon, 12 Oct 2020 03:00:14 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) In-Reply-To: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> Message-ID: On Fri, 25 Sep 2020 20:14:29 GMT, Paul Sandoz wrote: > This pull request is for integration of the Vector API. It was previously reviewed under conditions when mercurial was > used for the source code control system. Review threads can be found here (searching for issue number 8223347 in the > title): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-April/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/thread.html > > If mercurial was still being used the code would be pushed directly, once the CSR is approved. However, in this case a > pull request is required and needs explicit reviewer approval. Between the final review and this pull request no code > has changed, except for that related to merging. Marked as reviewed by darcy (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/367 From xliu at openjdk.java.net Mon Oct 12 04:58:13 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 12 Oct 2020 04:58:13 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v2] In-Reply-To: References: Message-ID: <9xcw21XLZb9cLCNBigStcxpug_stOa5IAoDPrrbnfJg=.bbc661db-6dba-4f1d-8379-aa26bccd5f70@github.com> On Sun, 11 Oct 2020 21:49:37 GMT, Vladimir Kozlov wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8254269: simplify Node::disconnect_inputs >> >> Optimize it for the precedence loop. because there's no null between >> 2 non-null precedences, disconnect_inputs can break at a null value. > > Changes requested by kvn (Reviewer). I filed JDK-8254369 for the bug. Let's keep this PR as a clean-up. ------------- PR: https://git.openjdk.java.net/jdk/pull/589 From fyang at openjdk.java.net Mon Oct 12 07:05:16 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Mon, 12 Oct 2020 07:05:16 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v5] In-Reply-To: References: <4NM17B6l4GvNgCbmmQTUcnfZTA6G-IEc85O8jH_q-xA=.63b10da7-bab7-44bc-a4c8-0a675aca45c0@github.com> Message-ID: On Sat, 10 Oct 2020 13:15:51 GMT, Kevin Rushforth wrote: >> I see Linux x64 failed. However, I don't seem to be able to withdraw my patch approval. >> However, please consider it withdrawn. > > @theRealAph if you still need to, you can withdraw your approval by reviewing it again and selecting "Request changes". > I have looked at the java security changes, i.e. src/java.base/share/classes/sun/security/provider/SHA3.java. It looks > fine. @valeriepeng : I see you are not listed under "Reviewers" commit message part, could you please press the magic button(s)(approve?) so you get the credit? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From roland at openjdk.java.net Mon Oct 12 07:25:10 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 12 Oct 2020 07:25:10 GMT Subject: Integrated: 8254314: Shenandoah: null checks in c2 should not skip over native load barrier In-Reply-To: References: Message-ID: On Fri, 9 Oct 2020 14:03:21 GMT, Roland Westrelin wrote: > C2 optimizes (CmpP (LoadBarrier o) NULL) as (CmpP o NULL). The ative > load barrier is not guaranteed to return a non null oop when passed a > non null oop so this optimization could lead to a crash. This pull request has now been integrated. Changeset: a2bb4c60 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/a2bb4c60 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8254314: Shenandoah: null checks in c2 should not skip over native load barrier Reviewed-by: rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/576 From roland at openjdk.java.net Mon Oct 12 07:59:11 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 12 Oct 2020 07:59:11 GMT Subject: RFR: 8251544: CTW: C2 fails with assert(no_dead_loop) failed: dead loop detected In-Reply-To: <8EdEziBkTtIKsise_9EP7uwgBoOfO-9IBAe3UQ3ydsc=.d01f5f9d-f05c-4a4b-b715-6ea8135ed1a3@github.com> References: <8EdEziBkTtIKsise_9EP7uwgBoOfO-9IBAe3UQ3ydsc=.d01f5f9d-f05c-4a4b-b715-6ea8135ed1a3@github.com> Message-ID: On Wed, 30 Sep 2020 07:44:06 GMT, Christian Hagedorn wrote: > In the testcase, we hit a dead data loop while dead nodes are being removed on a dead control path. A region node, lets > say `r`, that represents a loop (has three inputs: self, a loop entry and backedge) but is not a loop node, yet, > becomes dead when its entry control is replaced by top in the first IGVN run after parsing. All its phi nodes also > become dead by replacing the corresponding entry control input by top. The problem is now that some phi nodes of `r` > are processed by IGVN before the corresponding (dead) region node `r`. In `PhiNode::Ideal`, we actually check if there > is a dead loop. But after some of the phis of `r` were already removed, `is_unsafe_data_reference()` on > [L1939](https://github.com/openjdk/jdk/compare/master...chhagedorn:JDK-8251544#diff-efe6b3bde157b833249cd9a8d8b6645bL1939) > returns false. As a result, we do not realize in `PhiNode::Ideal` for one of the remaining phis that it is actually > dead and we apply the normal propagation in `PhiNode::Identity` which replaces the phi by its only non-top input. We > later apply an additional optimization for a `LoadNode` input of an `AddINode` in which we replace the `LoadNode` by > the `AddINode` itself (because of the data loop) and we end up with a dead data loop and fail with the dead loop > assertion. The order in which the nodes are processed in IGVN is crucial. I could only reproduce this bug with a very > specific CTW-like testcase which makes this quite an edge case. I can think of two ways ways how to fix this: 1. Delay > phi nodes which have only one non-top input left and whose region node is not a loop node, has three inputs from which > the entry control is top and the region has not been processed by IGVN, yet. 2. Extend the dead data loop check in > `PhiNode::Ideal()` to already do an unreachable region check as done in `RegionNode::Ideal()`. The result can be cached > as a region should not become reachable anymore once we figured out it is dead. I chose the second approach because I > think it is preferable as we are not delaying IGVN and all other phi nodes can already use the information of a dead > region before it is processed. This avoid any further unwanted optimizations on dead nodes. Thanks, Christian Looks good to me ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/425 From chagedorn at openjdk.java.net Mon Oct 12 08:19:11 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 12 Oct 2020 08:19:11 GMT Subject: RFR: 8253588: C1: assert(false) failed: unknown register on x86_32 only with -XX:+TraceLinearScanLevel=4 In-Reply-To: References: Message-ID: On Sat, 10 Oct 2020 00:12:57 GMT, Vladimir Kozlov wrote: >> [JDK-8251093](https://bugs.openjdk.java.net/browse/JDK-8251093) introduced some additional logging of intervals and its >> registers in various places. On 32-bit only, we could have two registers for an interval. A hi-register is only used >> when the interval has `_num_phys_regs` set to 2. In one such place >> ([L5448](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5448)), >> we log the hi-register `hint_regHi`. On >> [L5441](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5441), >> however, we can assign it an invalid register number when `_num_phys_regs` is 1. That was not a problem before >> JDK-8251093 as we only used `hint_regHi` later after a `_num_phys_regs == 2` check on >> [L5484](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5484). >> But the additional logging is performed earlier resulting in this assertion failure when trying to log the invalid >> `hint_regHi` register. Thanks, Christian > > Good. Thank you Tobias and Vladimir for your reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/356 From chagedorn at openjdk.java.net Mon Oct 12 08:19:12 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 12 Oct 2020 08:19:12 GMT Subject: Integrated: 8253588: C1: assert(false) failed: unknown register on x86_32 only with -XX:+TraceLinearScanLevel=4 In-Reply-To: References: Message-ID: On Fri, 25 Sep 2020 10:07:04 GMT, Christian Hagedorn wrote: > [JDK-8251093](https://bugs.openjdk.java.net/browse/JDK-8251093) introduced some additional logging of intervals and its > registers in various places. On 32-bit only, we could have two registers for an interval. A hi-register is only used > when the interval has `_num_phys_regs` set to 2. In one such place > ([L5448](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5448)), > we log the hi-register `hint_regHi`. On > [L5441](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5441), > however, we can assign it an invalid register number when `_num_phys_regs` is 1. That was not a problem before > JDK-8251093 as we only used `hint_regHi` later after a `_num_phys_regs == 2` check on > [L5484](https://github.com/chhagedorn/jdk/blob/29ed779487bad3c359fb13dfad3f41832637a470/src/hotspot/share/c1/c1_LinearScan.cpp#L5484). > But the additional logging is performed earlier resulting in this assertion failure when trying to log the invalid > `hint_regHi` register. Thanks, Christian This pull request has now been integrated. Changeset: 13fe054c Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/13fe054c Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8253588: C1: assert(false) failed: unknown register on x86_32 only with -XX:+TraceLinearScanLevel=4 Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/356 From chagedorn at openjdk.java.net Mon Oct 12 08:21:11 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 12 Oct 2020 08:21:11 GMT Subject: RFR: 8251544: CTW: C2 fails with assert(no_dead_loop) failed: dead loop detected In-Reply-To: References: <8EdEziBkTtIKsise_9EP7uwgBoOfO-9IBAe3UQ3ydsc=.d01f5f9d-f05c-4a4b-b715-6ea8135ed1a3@github.com> Message-ID: On Mon, 12 Oct 2020 07:56:43 GMT, Roland Westrelin wrote: >> In the testcase, we hit a dead data loop while dead nodes are being removed on a dead control path. A region node, lets >> say `r`, that represents a loop (has three inputs: self, a loop entry and backedge) but is not a loop node, yet, >> becomes dead when its entry control is replaced by top in the first IGVN run after parsing. All its phi nodes also >> become dead by replacing the corresponding entry control input by top. The problem is now that some phi nodes of `r` >> are processed by IGVN before the corresponding (dead) region node `r`. In `PhiNode::Ideal`, we actually check if there >> is a dead loop. But after some of the phis of `r` were already removed, `is_unsafe_data_reference()` on >> [L1939](https://github.com/openjdk/jdk/compare/master...chhagedorn:JDK-8251544#diff-efe6b3bde157b833249cd9a8d8b6645bL1939) >> returns false. As a result, we do not realize in `PhiNode::Ideal` for one of the remaining phis that it is actually >> dead and we apply the normal propagation in `PhiNode::Identity` which replaces the phi by its only non-top input. We >> later apply an additional optimization for a `LoadNode` input of an `AddINode` in which we replace the `LoadNode` by >> the `AddINode` itself (because of the data loop) and we end up with a dead data loop and fail with the dead loop >> assertion. The order in which the nodes are processed in IGVN is crucial. I could only reproduce this bug with a very >> specific CTW-like testcase which makes this quite an edge case. I can think of two ways ways how to fix this: 1. Delay >> phi nodes which have only one non-top input left and whose region node is not a loop node, has three inputs from which >> the entry control is top and the region has not been processed by IGVN, yet. 2. Extend the dead data loop check in >> `PhiNode::Ideal()` to already do an unreachable region check as done in `RegionNode::Ideal()`. The result can be cached >> as a region should not become reachable anymore once we figured out it is dead. I chose the second approach because I >> think it is preferable as we are not delaying IGVN and all other phi nodes can already use the information of a dead >> region before it is processed. This avoid any further unwanted optimizations on dead nodes. Thanks, Christian > > Looks good to me Thank you Vladimir and Roland for your reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/425 From chagedorn at openjdk.java.net Mon Oct 12 08:21:13 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 12 Oct 2020 08:21:13 GMT Subject: Integrated: 8251544: CTW: C2 fails with assert(no_dead_loop) failed: dead loop detected In-Reply-To: <8EdEziBkTtIKsise_9EP7uwgBoOfO-9IBAe3UQ3ydsc=.d01f5f9d-f05c-4a4b-b715-6ea8135ed1a3@github.com> References: <8EdEziBkTtIKsise_9EP7uwgBoOfO-9IBAe3UQ3ydsc=.d01f5f9d-f05c-4a4b-b715-6ea8135ed1a3@github.com> Message-ID: On Wed, 30 Sep 2020 07:44:06 GMT, Christian Hagedorn wrote: > In the testcase, we hit a dead data loop while dead nodes are being removed on a dead control path. A region node, lets > say `r`, that represents a loop (has three inputs: self, a loop entry and backedge) but is not a loop node, yet, > becomes dead when its entry control is replaced by top in the first IGVN run after parsing. All its phi nodes also > become dead by replacing the corresponding entry control input by top. The problem is now that some phi nodes of `r` > are processed by IGVN before the corresponding (dead) region node `r`. In `PhiNode::Ideal`, we actually check if there > is a dead loop. But after some of the phis of `r` were already removed, `is_unsafe_data_reference()` on > [L1939](https://github.com/openjdk/jdk/compare/master...chhagedorn:JDK-8251544#diff-efe6b3bde157b833249cd9a8d8b6645bL1939) > returns false. As a result, we do not realize in `PhiNode::Ideal` for one of the remaining phis that it is actually > dead and we apply the normal propagation in `PhiNode::Identity` which replaces the phi by its only non-top input. We > later apply an additional optimization for a `LoadNode` input of an `AddINode` in which we replace the `LoadNode` by > the `AddINode` itself (because of the data loop) and we end up with a dead data loop and fail with the dead loop > assertion. The order in which the nodes are processed in IGVN is crucial. I could only reproduce this bug with a very > specific CTW-like testcase which makes this quite an edge case. I can think of two ways ways how to fix this: 1. Delay > phi nodes which have only one non-top input left and whose region node is not a loop node, has three inputs from which > the entry control is top and the region has not been processed by IGVN, yet. 2. Extend the dead data loop check in > `PhiNode::Ideal()` to already do an unreachable region check as done in `RegionNode::Ideal()`. The result can be cached > as a region should not become reachable anymore once we figured out it is dead. I chose the second approach because I > think it is preferable as we are not delaying IGVN and all other phi nodes can already use the information of a dead > region before it is processed. This avoid any further unwanted optimizations on dead nodes. Thanks, Christian This pull request has now been integrated. Changeset: 54bbe76e Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/54bbe76e Stats: 264 lines in 3 files changed: 233 ins; 11 del; 20 mod 8251544: CTW: C2 fails with assert(no_dead_loop) failed: dead loop detected Reviewed-by: kvn, roland ------------- PR: https://git.openjdk.java.net/jdk/pull/425 From github.com+8792647+robcasloz at openjdk.java.net Mon Oct 12 08:38:12 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Oct 2020 08:38:12 GMT Subject: RFR: 8253765: C2: Control randomization in StressLCM and StressGCM In-Reply-To: <4JvN7jqgQ20BeeU45CQ8yt_S79pbWyT6t-7cv_GOHJA=.beb5effa-d7bf-4031-8ebe-5f6a06e3903d@github.com> References: <4JvN7jqgQ20BeeU45CQ8yt_S79pbWyT6t-7cv_GOHJA=.beb5effa-d7bf-4031-8ebe-5f6a06e3903d@github.com> Message-ID: On Fri, 9 Oct 2020 23:38:32 GMT, Vladimir Kozlov wrote: > Good. Thanks for reviewing Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/572 From thartmann at openjdk.java.net Mon Oct 12 10:37:18 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 12 Oct 2020 10:37:18 GMT Subject: RFR: 8254252: Generic arraycopy stub overwrites callee-save rdi register on 64-bit Windows Message-ID: Since [JDK-8241825](https://bugs.openjdk.java.net/browse/JDK-8241825), MacroAssembler::load_klass requires a temporary register to decode the klass pointer. In the generic arraycopy stub, rdi is used for that on 64-bit Windows because r9 is already used as an argument register: https://hg.openjdk.java.net/jdk/jdk/rev/0bb101fbeb10#l17.32 The problem is that rdi is callee-save [1] but not restored when returning from the stub. This leads to register corruption and more or less random crashes in the caller. Although JDK-8241825 is part of JDK 15, this was never a problem because we did not set the _WIN64 macro in adlc and as a result accidentally treated rdi (and rsi) as caller-save: https://github.com/openjdk/jdk/blob/b9873e18330b7e43ca47bc1c0655e7ab20828f7a/src/hotspot/cpu/x86/x86_64.ad#L89 Now that this got fixed as part of [JDK-8248238](https://bugs.openjdk.java.net/browse/JDK-8248238) [2], we hit the bug. Unfortunately, we are running out of caller-save registers on Windows. Since rcx, rdx, r8 and r9 are used for arguments, only rax, r10 and r11 remain which are already used as temporary registers and live throughout the stub code. I've decided to free up r11 by postponing the load of the array length which is only needed on Windows anyway (because only Windows passes the 5th argument on stack). Thanks, Tobias [1] https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 [2] https://openjdk.github.io/cr/?repo=jdk&pr=212&range=11#sdiff-8 ------------- Commit messages: - 8254252: Generic arraycopy stub overwrites callee-save rdi register on 64-bit Windows Changes: https://git.openjdk.java.net/jdk/pull/603/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=603&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254252 Stats: 28 lines in 1 file changed: 2 ins; 5 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/603.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/603/head:pull/603 PR: https://git.openjdk.java.net/jdk/pull/603 From thartmann at openjdk.java.net Mon Oct 12 10:55:13 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 12 Oct 2020 10:55:13 GMT Subject: RFR: 8253923: C2 doesn't always run loop opts for compilations that include loops In-Reply-To: References: Message-ID: On Fri, 2 Oct 2020 07:47:00 GMT, Roland Westrelin wrote: > 8253923: C2 doesn't always run loop opts for compilations that include loops Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/478 From roland at openjdk.java.net Mon Oct 12 10:55:13 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 12 Oct 2020 10:55:13 GMT Subject: RFR: 8253923: C2 doesn't always run loop opts for compilations that include loops In-Reply-To: References: Message-ID: On Mon, 12 Oct 2020 10:50:19 GMT, Tobias Hartmann wrote: > Looks good to me. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/478 From roland at openjdk.java.net Mon Oct 12 10:58:13 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 12 Oct 2020 10:58:13 GMT Subject: Integrated: 8253923: C2 doesn't always run loop opts for compilations that include loops In-Reply-To: References: Message-ID: On Fri, 2 Oct 2020 07:47:00 GMT, Roland Westrelin wrote: > 8253923: C2 doesn't always run loop opts for compilations that include loops This pull request has now been integrated. Changeset: a6c23b77 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/a6c23b77 Stats: 81 lines in 9 files changed: 75 ins; 2 del; 4 mod 8253923: C2 doesn't always run loop opts for compilations that include loops Reviewed-by: neliasso, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/478 From thartmann at openjdk.java.net Mon Oct 12 10:59:17 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 12 Oct 2020 10:59:17 GMT Subject: RFR: 8253765: C2: Control randomization in StressLCM and StressGCM In-Reply-To: References: Message-ID: On Fri, 9 Oct 2020 08:09:05 GMT, Roberto Casta?eda Lozano wrote: > Use the compilation-local seed in `StressLCM` and `StressGCM` rather than the global one. As a consequence, these > options use by default a fresh seed in every compilation, unless `StressSeed=N` is specified, in which case they behave > deterministically. Annotate tests that use `StressLCM` and `StressGCM` with the `stress` and `randomness` keys to > reflect this change in default behavior. Tested on `tier1` and on all test cases that use `StressLCM` and `StressGCM` > (10 times each). Otherwise looks good. src/hotspot/share/opto/c2_globals.hpp line 56: > 54: \ > 55: product(uint, StressSeed, 0, DIAGNOSTIC, \ > 56: "Seed for randomized stress testing (if unset, a random one is " \ The description string misses a `)` ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/572 From github.com+8792647+robcasloz at openjdk.java.net Mon Oct 12 11:14:14 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Oct 2020 11:14:14 GMT Subject: RFR: 8253765: C2: Control randomization in StressLCM and StressGCM In-Reply-To: References: Message-ID: On Mon, 12 Oct 2020 10:53:52 GMT, Tobias Hartmann wrote: >> Use the compilation-local seed in `StressLCM` and `StressGCM` rather than the global one. As a consequence, these >> options use by default a fresh seed in every compilation, unless `StressSeed=N` is specified, in which case they behave >> deterministically. Annotate tests that use `StressLCM` and `StressGCM` with the `stress` and `randomness` keys to >> reflect this change in default behavior. Tested on `tier1` and on all test cases that use `StressLCM` and `StressGCM` >> (10 times each). > > src/hotspot/share/opto/c2_globals.hpp line 56: > >> 54: \ >> 55: product(uint, StressSeed, 0, DIAGNOSTIC, \ >> 56: "Seed for randomized stress testing (if unset, a random one is " \ > > The description string misses a `)` Thanks, will fix! ------------- PR: https://git.openjdk.java.net/jdk/pull/572 From github.com+70893615+jasontatton-aws at openjdk.java.net Mon Oct 12 11:17:25 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Mon, 12 Oct 2020 11:17:25 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v6] In-Reply-To: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: > This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for > x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it > I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as > code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and > java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 > > Details of testing: > ============ > I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note > that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the > input String. Hence the test has been designed to cover all these cases. In summary they are: > - A ?short? string of < 16 characters. > - A SIMD String of 16 ? 31 characters. > - A AVX2 SIMD String of 32 characters+. > > Hardware used for testing: > ----------------------------- > > - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). > - AWS Graviton 2 (ARM 64 processor). > > I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. > > Possible future enhancements: > ==================== > For the x86 implementation there may be two further improvements we can make in order to improve performance of both > the StringUTF16 and StringLatin1 indexOf(char) intrinsics: > 1. Make use of AVX-512 instructions. > 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD > instructions instead of a loop. > Benchmark results: > ============ > **Without** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ? 182.581 | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ? 435.933 | ns/op | > > > **With** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ? 407.716 | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ? 167.306 | ns/op | > > > The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 > indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when > running on ARM. Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: Added missing copyright notices ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/71/files - new: https://git.openjdk.java.net/jdk/pull/71/files/8ead02ab..3ae1d92d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=71&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=71&range=04-05 Stats: 45 lines in 2 files changed: 45 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/71.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/71/head:pull/71 PR: https://git.openjdk.java.net/jdk/pull/71 From github.com+70893615+jasontatton-aws at openjdk.java.net Mon Oct 12 11:17:27 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Mon, 12 Oct 2020 11:17:27 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v3] In-Reply-To: <0_4KDtvn6WhHRCxqupbUQjLauR5DMQWJluogsJ7m_KA=.697e6e15-5c91-4339-abab-1dff51871e8d@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> <0_4KDtvn6WhHRCxqupbUQjLauR5DMQWJluogsJ7m_KA=.697e6e15-5c91-4339-abab-1dff51871e8d@github.com> Message-ID: <5sQCV-tDrsi1Ivud3DaxJq5vfJMQDJoZ4tPfKj_JI60=.82ae2d6d-c9b1-46e3-ac2b-284d6ea8878e@github.com> On Sat, 10 Oct 2020 00:10:54 GMT, Vladimir Kozlov wrote: >> Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing newline to end of vmSymbols.cpp > > test/hotspot/jtreg/compiler/intrinsics/string/TestStringLatin1IndexOfChar.java line 1: > >> 1: /* > > Missing copyright+GPL header in new test. See other tests fro example. Thanks I have added this now > test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java line 1: > >> 1: package org.openjdk.bench.java.lang; > > Again missing Copyright. Thanks I have added this now ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From github.com+8792647+robcasloz at openjdk.java.net Mon Oct 12 11:23:28 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Oct 2020 11:23:28 GMT Subject: RFR: 8253765: C2: Control randomization in StressLCM and StressGCM [v2] In-Reply-To: References: Message-ID: > Use the compilation-local seed in `StressLCM` and `StressGCM` rather than the global one. As a consequence, these > options use by default a fresh seed in every compilation, unless `StressSeed=N` is specified, in which case they behave > deterministically. Annotate tests that use `StressLCM` and `StressGCM` with the `stress` and `randomness` keys to > reflect this change in default behavior. Tested on `tier1` and on all test cases that use `StressLCM` and `StressGCM` > (10 times each). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Close parenthesis in option description ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/572/files - new: https://git.openjdk.java.net/jdk/pull/572/files/64c65388..85dc52bf Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=572&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=572&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/572/head:pull/572 PR: https://git.openjdk.java.net/jdk/pull/572 From bulasevich at openjdk.java.net Mon Oct 12 11:30:35 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Mon, 12 Oct 2020 11:30:35 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v2] In-Reply-To: References: Message-ID: > Let me revive the change request [3] to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 > & 0xFF) | ((v2 & 0xFF) << 8)". > > Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. > > As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new > BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI > instruction is emitted for the new node. > > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html > [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039653.html > [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: BitfieldInsert microbenchmark added to show the preformance improvement ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/511/files - new: https://git.openjdk.java.net/jdk/pull/511/files/8a4c6a90..42b81afc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=511&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=511&range=00-01 Stats: 94 lines in 1 file changed: 94 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/511/head:pull/511 PR: https://git.openjdk.java.net/jdk/pull/511 From bulasevich at openjdk.java.net Mon Oct 12 11:35:33 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Mon, 12 Oct 2020 11:35:33 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two [v3] In-Reply-To: References: Message-ID: <2f3DqPP9JunqqxK9guYieCgu3FijfoIt7R5yQNar-3g=.000d3fe3-42dd-4b69-8c2e-f52e2cf6a690@github.com> > Let me revive the change request [3] to C2 and AArch64 that applies Bitfield Insert instruction in the expression "(v1 > & 0xFF) | ((v2 & 0xFF) << 8)". > > Compared to the last round of review [2] I updated the transformation to apply BFI in more cases and added a jtreg test. > > As before, compared to the original patch [1], the transformation logic is now in the common C2 code: a new > BitfieldInsert node has been introduced to replace Or+Shift+And sequence when possible, on AARCH a single BFI > instruction is emitted for the new node. > > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039161.html > [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039653.html > [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039792.html Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: minor change (jcheck: trailing whitespace, tab) ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/511/files - new: https://git.openjdk.java.net/jdk/pull/511/files/42b81afc..e5833fec Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=511&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=511&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/511/head:pull/511 PR: https://git.openjdk.java.net/jdk/pull/511 From roland at openjdk.java.net Mon Oct 12 11:38:18 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 12 Oct 2020 11:38:18 GMT Subject: RFR: 8223051: support loops with long (64b) trip counts [v2] In-Reply-To: References: Message-ID: > Last webrev was: > > https://cr.openjdk.java.net/~roland/8223051/webrev.03/ > > This PR includes a few minor changes: > > - The change in callnode.cpp that Vladimir requested in: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039551.html > > - Extra comments that John requested in: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039621.html > > - A couple extra counters to collect more detailed statistics > > 8252696 (Loop unswitching may cause out of bound array load to be > executed) was the only bug that was uncovered by extended testing and > it's fixed now. > > This was previously reviewed by Tobias, Vladimir and John. Given the > last changes were either requested by reviewers or a straighforward > improvement to statistics, and unless someone objects, I intend to > push this in the next few days with the reviewer list I just > mentioned. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - trailing whitespaces - long counted loops ------------- Changes: https://git.openjdk.java.net/jdk/pull/318/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=318&range=01 Stats: 902 lines in 11 files changed: 823 ins; 63 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/318.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/318/head:pull/318 PR: https://git.openjdk.java.net/jdk/pull/318 From github.com+8792647+robcasloz at openjdk.java.net Mon Oct 12 11:44:17 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Oct 2020 11:44:17 GMT Subject: Integrated: 8253765: C2: Control randomization in StressLCM and StressGCM In-Reply-To: References: Message-ID: <519SOycf58jTUBDNTGREBMrDHLbyxg_qv3LLEVv02_g=.0a09a427-7a96-4754-9160-71fd6bd7a213@github.com> On Fri, 9 Oct 2020 08:09:05 GMT, Roberto Casta?eda Lozano wrote: > Use the compilation-local seed in `StressLCM` and `StressGCM` rather than the global one. As a consequence, these > options use by default a fresh seed in every compilation, unless `StressSeed=N` is specified, in which case they behave > deterministically. Annotate tests that use `StressLCM` and `StressGCM` with the `stress` and `randomness` keys to > reflect this change in default behavior. Tested on `tier1` and on all test cases that use `StressLCM` and `StressGCM` > (10 times each). This pull request has now been integrated. Changeset: 05459df0 Author: Roberto Casta?eda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/05459df0 Stats: 105 lines in 18 files changed: 90 ins; 0 del; 15 mod 8253765: C2: Control randomization in StressLCM and StressGCM Use the compilation-local seed in 'StressLCM' and 'StressGCM' rather than the global one. As a consequence, these options use by default a fresh seed in every compilation, unless 'StressSeed=N' is specified, in which case they behave deterministically. Annotate tests that use 'StressLCM' and 'StressGCM' with the 'stress' and 'randomness' keys to reflect this change in default behavior. Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/572 From erikj at openjdk.java.net Mon Oct 12 12:59:23 2020 From: erikj at openjdk.java.net (Erik Joelsson) Date: Mon, 12 Oct 2020 12:59:23 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) In-Reply-To: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> Message-ID: On Fri, 25 Sep 2020 20:14:29 GMT, Paul Sandoz wrote: > This pull request is for integration of the Vector API. It was previously reviewed under conditions when mercurial was > used for the source code control system. Review threads can be found here (searching for issue number 8223347 in the > title): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-April/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/thread.html > > If mercurial was still being used the code would be pushed directly, once the CSR is approved. However, in this case a > pull request is required and needs explicit reviewer approval. Between the final review and this pull request no code > has changed, except for that related to merging. Build changes look good. ------------- Marked as reviewed by erikj (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/367 From fyang at openjdk.java.net Mon Oct 12 14:47:32 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Mon, 12 Oct 2020 14:47:32 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v8] In-Reply-To: References: Message-ID: <76qazRT5aX06rurPVGQmtfH2af9_l7DEdy_mGyF7BQQ=.4dd45e3f-b7a4-44c9-9824-4daab9b7dc3b@github.com> > Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com > > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. > Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 > > Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. > For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. > "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. > > We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. > Patch passed jtreg tier1-3 tests with QEMU system emulator. > Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make > sure that there's no regression. > We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java > We measured the performance benefit with an aarch64 cycle-accurate simulator. > Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. > > For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on > real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is > detected. Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Remove unnecessary code changes in vm_version_aarch64.cpp - Merge master - Merge master - Merge master - Merge master - Add sha3 instructions to cpu/aarch64/aarch64-asmtest.py and regenerate the test in assembler_aarch64.cpp:asm_check - Rebase - Merge master - Fix trailing whitespace issue - 8252204: AArch64: Implement SHA3 accelerator/intrinsic Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com ------------- Changes: https://git.openjdk.java.net/jdk/pull/207/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=207&range=07 Stats: 1498 lines in 35 files changed: 1011 ins; 22 del; 465 mod Patch: https://git.openjdk.java.net/jdk/pull/207.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/207/head:pull/207 PR: https://git.openjdk.java.net/jdk/pull/207 From felix.yang at huawei.com Mon Oct 12 15:07:55 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 12 Oct 2020 15:07:55 +0000 Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v5] Message-ID: Hi Andrew, > > On 10/10/2020 14:09, Fei Yang wrote: > > On Fri, 9 Oct 2020 17:35:22 GMT, Andrew Haley wrote: > > > >> I see Linux x64 failed. However, I don't seem to be able to withdraw my > patch approval. > >> However, please consider it withdrawn. > > > > Thanks for approving this patch. > > I checked the error messages and I think the failures were not caused by > this patch. > > The failures has been fixed by the following two commits: > > commit ec41046c5ce7077eebf4a3c265f79c7fba33d916 > > 8254348: Build fails when cds is disabled after JDK-8247536 > > commit aaa0a2a04792d7c84150e9d972790978ffcc6890 > > 8254297: Zero and Minimal VMs are broken with undeclared > > identifier 'DerivedPointerTable' after JDK-8253180 > > > > The testing was triggered again automatically after I merge master and I see > it passed now. > > OK, so we're good. Just eyeballed the latest version and witnessed two unnecessary code changes in vm_version_aarch64.cpp. However, these does not affect functionality and I have did one extra commit to remove them. https://github.com/openjdk/jdk/pull/207/commits/6cd27beee5262fe86b8b4be1e0bede0fbdcb8fbc > > Do you have any comments for the discussion here? > > https://github.com/openjdk/jdk/pull/207#issuecomment-701243662 > > > > Valerie Peng has checked the java security changes, i.e. > src/java.base/share/classes/sun/security/provider/SHA3.java. > > Do you think we need another reviewer for this patch? > > If we've had a security engineer look at the shared code, we're good to go. I think Valerie Peng only reviewed the changes for SHA3.java. Have you checked the hotspot shared code changes? If not, we may need someone reviewing that part. Thanks, Felix From xliu at openjdk.java.net Mon Oct 12 16:54:13 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 12 Oct 2020 16:54:13 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v2] In-Reply-To: References: Message-ID: On Mon, 12 Oct 2020 00:52:04 GMT, Xin Liu wrote: >> src/hotspot/share/opto/node.cpp line 916: >> >>> 914: for (; i < len() && in(i) != nullptr; ++i) { >>> 915: set_prec(i, nullptr); >>> 916: } >> >> Something like next: >> for (uint i = req(); i < len(); ++i) { >> if (in(i) != nullptr) { >> set_prec(i, nullptr); >> } >> } > > This is trickier than I thought. `set_prec(i, nullptr)` will refill _in[i] with the last non-null value, or NULL. > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/node.cpp#L1030 > > It means iteration from req() to len() skip a node if there are more than one non-null precedences. I will fix it in this issue. https://bugs.openjdk.java.net/browse/JDK-8254369 ------------- PR: https://git.openjdk.java.net/jdk/pull/589 From xliu at openjdk.java.net Mon Oct 12 16:54:16 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 12 Oct 2020 16:54:16 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v3] In-Reply-To: <5a2T8EHC6civgnwqGIPzSXxzzC1qanfEVP4hdZRst9A=.c326734c-a9f4-4c9f-925c-e8425b677f8c@github.com> References: <5a2T8EHC6civgnwqGIPzSXxzzC1qanfEVP4hdZRst9A=.c326734c-a9f4-4c9f-925c-e8425b677f8c@github.com> Message-ID: On Sat, 10 Oct 2020 15:12:25 GMT, Claes Redestad wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8254269: simplify Node::disconnect_inputs > > src/hotspot/share/opto/node.cpp line 909: > >> 907: uint max = len(); >> 908: for( uint i = 0; i < max; ++i ) { >> 909: if( in(i) == nullptr ) continue; > > Same here I have updated it. thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/589 From redestad at openjdk.java.net Mon Oct 12 17:07:22 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 12 Oct 2020 17:07:22 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v3] In-Reply-To: <3SliY5Au71QxorULw7ntHhcGIIG0LA8B7wy08ekfSWI=.65cf0bb7-3290-4e6a-97fe-fd0dacb075b3@github.com> References: <3SliY5Au71QxorULw7ntHhcGIIG0LA8B7wy08ekfSWI=.65cf0bb7-3290-4e6a-97fe-fd0dacb075b3@github.com> Message-ID: On Mon, 12 Oct 2020 01:14:21 GMT, Xin Liu wrote: >> 8254269: simplify Node::disconnect_inputs > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8254269: simplify Node::disconnect_inputs Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/589 From kvn at openjdk.java.net Mon Oct 12 18:46:18 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 12 Oct 2020 18:46:18 GMT Subject: RFR: 8254269: simplify Node::disconnect_inputs [v3] In-Reply-To: <3SliY5Au71QxorULw7ntHhcGIIG0LA8B7wy08ekfSWI=.65cf0bb7-3290-4e6a-97fe-fd0dacb075b3@github.com> References: <3SliY5Au71QxorULw7ntHhcGIIG0LA8B7wy08ekfSWI=.65cf0bb7-3290-4e6a-97fe-fd0dacb075b3@github.com> Message-ID: On Mon, 12 Oct 2020 01:14:21 GMT, Xin Liu wrote: >> 8254269: simplify Node::disconnect_inputs > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8254269: simplify Node::disconnect_inputs Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/589 From kvn at openjdk.java.net Mon Oct 12 19:31:14 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 12 Oct 2020 19:31:14 GMT Subject: RFR: 8254252: Generic arraycopy stub overwrites callee-save rdi register on 64-bit Windows In-Reply-To: References: Message-ID: On Mon, 12 Oct 2020 10:32:48 GMT, Tobias Hartmann wrote: > Since [JDK-8241825](https://bugs.openjdk.java.net/browse/JDK-8241825), MacroAssembler::load_klass requires a temporary > register to decode the klass pointer. In the generic arraycopy stub, rdi is used for that on 64-bit Windows because r9 > is already used as an argument register: https://hg.openjdk.java.net/jdk/jdk/rev/0bb101fbeb10#l17.32 > The problem is that rdi is callee-save [1] but not restored when returning from the stub. This leads to register > corruption and more or less random crashes in the caller. > Although JDK-8241825 is part of JDK 15, this was never a problem because we did not set the _WIN64 macro in adlc and as > a result accidentally treated rdi (and rsi) as caller-save: > https://github.com/openjdk/jdk/blob/b9873e18330b7e43ca47bc1c0655e7ab20828f7a/src/hotspot/cpu/x86/x86_64.ad#L89 > Now that this got fixed as part of [JDK-8248238](https://bugs.openjdk.java.net/browse/JDK-8248238) [2], we hit the bug. > > Unfortunately, we are running out of caller-save registers on Windows. Since rcx, rdx, r8 and r9 are used for > arguments, only rax, r10 and r11 remain which are already used as temporary registers and live throughout the stub > code. I've decided to free up r11 by postponing the load of the array length which is only needed on Windows anyway > (because only Windows passes the 5th argument on stack). Thanks, Tobias > > [1] https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 > [2] https://openjdk.github.io/cr/?repo=jdk&pr=212&range=11#sdiff-8 @TobiHartmann It is okay to use push/pop to preserve registers in stubs if code become cleaner: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L3647 Did you consider it? ------------- PR: https://git.openjdk.java.net/jdk/pull/603 From shade at openjdk.java.net Mon Oct 12 19:34:16 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 12 Oct 2020 19:34:16 GMT Subject: RFR: 8254611: x86_32: Call to IRT::at_unwind clobbers rthread after JDK-8253180 Message-ID: There are massive x86_32 tier1 failures, bisection points to JDK-8253180. I think I know why this happens: in `InterpreterMacroAssembler::remove_activation`, there is a new call_VM to `InterpreterRuntime::at_unwind`, which broke the `rthread` (`rcx`) that x86_32 needs later. x86_64 is not affected, because it carries it in `r15_thread`. Attention @fisk. Additional testing: - [ ] x86_32 tier1 ------------- Commit messages: - 8254611: x86_32: Call to IRT::at_unwind clobbers rthread after JDK-8253180 Changes: https://git.openjdk.java.net/jdk/pull/615/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=615&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254611 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/615.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/615/head:pull/615 PR: https://git.openjdk.java.net/jdk/pull/615 From xliu at openjdk.java.net Mon Oct 12 19:57:13 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 12 Oct 2020 19:57:13 GMT Subject: Integrated: 8254269: simplify Node::disconnect_inputs In-Reply-To: References: Message-ID: <8MumOAE5zZi30x4CeQzYblCGv_DxJ6ElILsfZmjAr28=.345e9076-8139-4751-b299-a63ab81bb3f5@github.com> On Sat, 10 Oct 2020 14:53:23 GMT, Xin Liu wrote: > 8254269: simplify Node::disconnect_inputs This pull request has now been integrated. Changeset: bff586f0 Author: Xin Liu Committer: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/bff586f0 Stats: 54 lines in 11 files changed: 1 ins; 13 del; 40 mod 8254269: simplify Node::disconnect_inputs Node::disconnect_inputs cuts off all input edges without exception. Reviewed-by: redestad, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/589 From eosterlund at openjdk.java.net Mon Oct 12 20:13:10 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 12 Oct 2020 20:13:10 GMT Subject: RFR: 8254611: x86_32: Call to IRT::at_unwind clobbers rthread after JDK-8253180 In-Reply-To: References: Message-ID: On Mon, 12 Oct 2020 19:23:20 GMT, Aleksey Shipilev wrote: > There are massive x86_32 tier1 failures, bisection points to JDK-8253180. I think I know why this happens: in > `InterpreterMacroAssembler::remove_activation`, there is a new call_VM to `InterpreterRuntime::at_unwind`, which broke > the `rthread` (`rcx`) that x86_32 needs later. x86_64 is not affected, because it carries it in `r15_thread`. > Attention @fisk. > Additional testing: > - [ ] x86_32 tier1 Oops. Looks good Aleksey - thanks for fixing. Obviously another solution is to not perform the call on 32 bit x86, as the stack watermark barriers are not supported yet there. But perhaps if someone wants to have a stab at that, it could be good to do roughly the same thing now. With that said - ship it! ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/615 From valeriep at openjdk.java.net Mon Oct 12 23:00:12 2020 From: valeriep at openjdk.java.net (Valerie Peng) Date: Mon, 12 Oct 2020 23:00:12 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v5] In-Reply-To: References: <4NM17B6l4GvNgCbmmQTUcnfZTA6G-IEc85O8jH_q-xA=.63b10da7-bab7-44bc-a4c8-0a675aca45c0@github.com> Message-ID: On Mon, 12 Oct 2020 07:02:05 GMT, Fei Yang wrote: > > > > I have looked at the java security changes, i.e. src/java.base/share/classes/sun/security/provider/SHA3.java. It looks > > fine. > > @valeriepeng : I see you are not listed under "Reviewers" commit message part, could you please press the magic > button(s)(approve?) so you get the credit? Thanks. It's fine, the part I reviewed is only a small part of the changes, so I will leave the reviewer approval upto the hotspot team. Thanks, Valerie ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From psandoz at openjdk.java.net Mon Oct 12 23:12:36 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 12 Oct 2020 23:12:36 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) [v2] In-Reply-To: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> Message-ID: > This pull request is for integration of the Vector API. It was previously reviewed under conditions when mercurial was > used for the source code control system. Review threads can be found here (searching for issue number 8223347 in the > title): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-April/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/thread.html > > If mercurial was still being used the code would be pushed directly, once the CSR is approved. However, in this case a > pull request is required and needs explicit reviewer approval. Between the final review and this pull request no code > has changed, except for that related to merging. Paul Sandoz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - HotspotIntrinsicCandidate to IntrinsicCandidate - Merge master - Fix permissions - Fix permissions - Merge master - Vector API new files - Integration of Vector API (Incubator) ------------- Changes: https://git.openjdk.java.net/jdk/pull/367/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=367&range=01 Stats: 295150 lines in 336 files changed: 292957 ins; 1062 del; 1131 mod Patch: https://git.openjdk.java.net/jdk/pull/367.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/367/head:pull/367 PR: https://git.openjdk.java.net/jdk/pull/367 From shade at openjdk.java.net Tue Oct 13 05:50:10 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 13 Oct 2020 05:50:10 GMT Subject: Integrated: 8254611: x86_32: Call to IRT::at_unwind clobbers rthread after JDK-8253180 In-Reply-To: References: Message-ID: On Mon, 12 Oct 2020 19:23:20 GMT, Aleksey Shipilev wrote: > There are massive x86_32 tier1 failures, bisection points to JDK-8253180. I think I know why this happens: in > `InterpreterMacroAssembler::remove_activation`, there is a new call_VM to `InterpreterRuntime::at_unwind`, which broke > the `rthread` (`rcx`) that x86_32 needs later. x86_64 is not affected, because it carries it in `r15_thread`. > Attention @fisk. > Additional testing: > - [x] x86_32 tier1 This pull request has now been integrated. Changeset: 90de2894 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/90de2894 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8254611: x86_32: Call to IRT::at_unwind clobbers rthread after JDK-8253180 Reviewed-by: eosterlund ------------- PR: https://git.openjdk.java.net/jdk/pull/615 From thartmann at openjdk.java.net Tue Oct 13 08:25:24 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 13 Oct 2020 08:25:24 GMT Subject: RFR: 8254252: Generic arraycopy stub overwrites callee-save rdi register on 64-bit Windows [v2] In-Reply-To: References: Message-ID: > Since [JDK-8241825](https://bugs.openjdk.java.net/browse/JDK-8241825), MacroAssembler::load_klass requires a temporary > register to decode the klass pointer. In the generic arraycopy stub, rdi is used for that on 64-bit Windows because r9 > is already used as an argument register: https://hg.openjdk.java.net/jdk/jdk/rev/0bb101fbeb10#l17.32 > The problem is that rdi is callee-save [1] but not restored when returning from the stub. This leads to register > corruption and more or less random crashes in the caller. > Although JDK-8241825 is part of JDK 15, this was never a problem because we did not set the _WIN64 macro in adlc and as > a result accidentally treated rdi (and rsi) as caller-save: > https://github.com/openjdk/jdk/blob/b9873e18330b7e43ca47bc1c0655e7ab20828f7a/src/hotspot/cpu/x86/x86_64.ad#L89 > Now that this got fixed as part of [JDK-8248238](https://bugs.openjdk.java.net/browse/JDK-8248238) [2], we hit the bug. > > Unfortunately, we are running out of caller-save registers on Windows. Since rcx, rdx, r8 and r9 are used for > arguments, only rax, r10 and r11 remain which are already used as temporary registers and live throughout the stub > code. I've decided to free up r11 by postponing the load of the array length which is only needed on Windows anyway > (because only Windows passes the 5th argument on stack). Thanks, Tobias > > [1] https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 > [2] https://openjdk.github.io/cr/?repo=jdk&pr=212&range=11#sdiff-8 Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Push/pop version ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/603/files - new: https://git.openjdk.java.net/jdk/pull/603/files/29f92cd9..f52fcaec Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=603&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=603&range=00-01 Stats: 45 lines in 1 file changed: 22 ins; 2 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/603.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/603/head:pull/603 PR: https://git.openjdk.java.net/jdk/pull/603 From thartmann at openjdk.java.net Tue Oct 13 08:25:25 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 13 Oct 2020 08:25:25 GMT Subject: RFR: 8254252: Generic arraycopy stub overwrites callee-save rdi register on 64-bit Windows In-Reply-To: References: Message-ID: On Mon, 12 Oct 2020 19:27:59 GMT, Vladimir Kozlov wrote: >> Since [JDK-8241825](https://bugs.openjdk.java.net/browse/JDK-8241825), MacroAssembler::load_klass requires a temporary >> register to decode the klass pointer. In the generic arraycopy stub, rdi is used for that on 64-bit Windows because r9 >> is already used as an argument register: https://hg.openjdk.java.net/jdk/jdk/rev/0bb101fbeb10#l17.32 >> The problem is that rdi is callee-save [1] but not restored when returning from the stub. This leads to register >> corruption and more or less random crashes in the caller. >> Although JDK-8241825 is part of JDK 15, this was never a problem because we did not set the _WIN64 macro in adlc and as >> a result accidentally treated rdi (and rsi) as caller-save: >> https://github.com/openjdk/jdk/blob/b9873e18330b7e43ca47bc1c0655e7ab20828f7a/src/hotspot/cpu/x86/x86_64.ad#L89 >> Now that this got fixed as part of [JDK-8248238](https://bugs.openjdk.java.net/browse/JDK-8248238) [2], we hit the bug. >> >> Unfortunately, we are running out of caller-save registers on Windows. Since rcx, rdx, r8 and r9 are used for >> arguments, only rax, r10 and r11 remain which are already used as temporary registers and live throughout the stub >> code. I've decided to free up r11 by postponing the load of the array length which is only needed on Windows anyway >> (because only Windows passes the 5th argument on stack). Thanks, Tobias >> >> [1] https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 >> [2] https://openjdk.github.io/cr/?repo=jdk&pr=212&range=11#sdiff-8 > > @TobiHartmann It is okay to use push/pop to preserve registers in stubs if code become cleaner: > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L3647 > Did you consider it? @vnkozlov Yes, I did consider pushing/poping rdi on Windows but the `generate_generic_copy` method jumps into other stubs like `entry_checkcast_arraycopy` that then exit the frame. We would therefore need to make sure to pop before jumping to another stub and be careful to not pop twice when exiting through `L_failed` or falling through. I've updated the PR with a push/pop version of the fix. Please let me know which version you prefer. ------------- PR: https://git.openjdk.java.net/jdk/pull/603 From github.com+8792647+robcasloz at openjdk.java.net Tue Oct 13 08:38:15 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 Oct 2020 08:38:15 GMT Subject: RFR: 8254602: compiler/debug/TestStressCM.java failed with "RuntimeException: got the same optimization stats for different seeds: expected 45" Message-ID: Remove test assertion checking that different random seeds lead to different code motion decisions. This was the case for the specific pair of random seeds, IR fed to code motion, and target platforms tested originally; but does not need to hold in general. Remove similar test assertion in IGVN randomization test case. Re-enable the test case. ------------- Commit messages: - 8254602: compiler/debug/TestStressCM.java failed with "RuntimeException: got the same optimization stats for different seeds: expected 45" Changes: https://git.openjdk.java.net/jdk/pull/626/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=626&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254602 Stats: 14 lines in 3 files changed: 0 ins; 9 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/626.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/626/head:pull/626 PR: https://git.openjdk.java.net/jdk/pull/626 From github.com+8792647+robcasloz at openjdk.java.net Tue Oct 13 08:38:15 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 Oct 2020 08:38:15 GMT Subject: RFR: 8254602: compiler/debug/TestStressCM.java failed with "RuntimeException: got the same optimization stats for different seeds: expected 45" In-Reply-To: References: Message-ID: On Tue, 13 Oct 2020 08:23:42 GMT, Roberto Casta?eda Lozano wrote: > Remove test assertion checking that different random seeds lead to different code motion decisions. This was the case > for the specific pair of random seeds, IR fed to code motion, and target platforms tested originally; but does not need > to hold in general. Remove similar test assertion in IGVN randomization test case. Re-enable the test case. Remove test assertion checking that different random seeds lead to different code motion decisions. This was the case for the specific pair of random seeds, IR fed to code motion, and target platforms tested originally; but does not need to hold in general. Remove similar test assertion in IGVN randomization test case. Re-enable the test case. ------------- PR: https://git.openjdk.java.net/jdk/pull/626 From github.com+8792647+robcasloz at openjdk.java.net Tue Oct 13 08:38:15 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 Oct 2020 08:38:15 GMT Subject: RFR: 8254602: compiler/debug/TestStressCM.java failed with "RuntimeException: got the same optimization stats for different seeds: expected 45" In-Reply-To: References: Message-ID: On Tue, 13 Oct 2020 08:24:34 GMT, Roberto Casta?eda Lozano wrote: >> Remove test assertion checking that different random seeds lead to different code motion decisions. This was the case >> for the specific pair of random seeds, IR fed to code motion, and target platforms tested originally; but does not need >> to hold in general. Remove similar test assertion in IGVN randomization test case. Re-enable the test case. > > Remove test assertion checking that different random seeds lead to different > code motion decisions. This was the case for the specific pair of random seeds, > IR fed to code motion, and target platforms tested originally; but does not need > to hold in general. Remove similar test assertion in IGVN randomization test > case. Re-enable the test case. The fixed test cases pass on windows-x64, linux-x64, macosx-x64, and linux-aarch64 (all in both release and debug). ------------- PR: https://git.openjdk.java.net/jdk/pull/626 From github.com+8792647+robcasloz at openjdk.java.net Tue Oct 13 08:45:21 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 Oct 2020 08:45:21 GMT Subject: RFR: 8254575: C2: Clean up unused TRACK_PHI_INPUTS assertion code Message-ID: Remove assertion code that was disabled in all build configurations. ------------- Commit messages: - 8254575: C2: Clean up unused TRACK_PHI_INPUTS assertion code Changes: https://git.openjdk.java.net/jdk/pull/606/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=606&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254575 Stats: 23 lines in 1 file changed: 0 ins; 23 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/606.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/606/head:pull/606 PR: https://git.openjdk.java.net/jdk/pull/606 From github.com+8792647+robcasloz at openjdk.java.net Tue Oct 13 08:45:21 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 Oct 2020 08:45:21 GMT Subject: RFR: 8254575: C2: Clean up unused TRACK_PHI_INPUTS assertion code In-Reply-To: References: Message-ID: <36VUtpNHmnBt8X4AQgRhi1Y3BA_-Rib9u8ThvXViY0I=.29fb76c1-1b7f-4e1b-8091-5c421f6558fa@github.com> On Mon, 12 Oct 2020 11:28:00 GMT, Roberto Casta?eda Lozano wrote: > Remove assertion code that was disabled in all build configurations. Remove assertion code that was disabled in all build configurations. ------------- PR: https://git.openjdk.java.net/jdk/pull/606 From github.com+8792647+robcasloz at openjdk.java.net Tue Oct 13 08:45:21 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 Oct 2020 08:45:21 GMT Subject: RFR: 8254575: C2: Clean up unused TRACK_PHI_INPUTS assertion code In-Reply-To: <36VUtpNHmnBt8X4AQgRhi1Y3BA_-Rib9u8ThvXViY0I=.29fb76c1-1b7f-4e1b-8091-5c421f6558fa@github.com> References: <36VUtpNHmnBt8X4AQgRhi1Y3BA_-Rib9u8ThvXViY0I=.29fb76c1-1b7f-4e1b-8091-5c421f6558fa@github.com> Message-ID: On Mon, 12 Oct 2020 11:29:15 GMT, Roberto Casta?eda Lozano wrote: >> Remove assertion code that was disabled in all build configurations. > > Remove assertion code that was disabled in all build configurations. Tested by building on windows-x64, linux-x64, and macosx-x64 (all in both debug and release mode). ------------- PR: https://git.openjdk.java.net/jdk/pull/606 From vlivanov at openjdk.java.net Tue Oct 13 12:31:13 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 13 Oct 2020 12:31:13 GMT Subject: RFR: 8254575: C2: Clean up unused TRACK_PHI_INPUTS assertion code In-Reply-To: References: Message-ID: On Mon, 12 Oct 2020 11:28:00 GMT, Roberto Casta?eda Lozano wrote: > Remove assertion code that was disabled in all build configurations. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/606 From shade at openjdk.java.net Tue Oct 13 12:55:15 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 13 Oct 2020 12:55:15 GMT Subject: RFR: 8254602: compiler/debug/TestStressCM.java failed with "RuntimeException: got the same optimization stats for different seeds: expected 45" In-Reply-To: References: Message-ID: On Tue, 13 Oct 2020 08:23:42 GMT, Roberto Casta?eda Lozano wrote: > Remove test assertion checking that different random seeds lead to different code motion decisions. This was the case > for the specific pair of random seeds, IR fed to code motion, and target platforms tested originally; but does not need > to hold in general. Remove similar test assertion in IGVN randomization test case. Re-enable the test case. Makes sense to me. There is no guarantee different seeds yield different compilations. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/626 From joserz at linux.ibm.com Tue Oct 13 13:33:31 2020 From: joserz at linux.ibm.com (joserz at linux.ibm.com) Date: Tue, 13 Oct 2020 10:33:31 -0300 Subject: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: References: <20200909163733.GA422344@pacoca> <20200911012040.GA518622@pacoca> Message-ID: <20201013133331.GA30104@pacoca> Hello Martin!! Ticket is updated: https://bugs.openjdk.java.net/browse/JDK-8248190 Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.02/ Thank you!! On Fri, Sep 11, 2020 at 12:26:09PM +0000, Doerr, Martin wrote: > Hi Jose, > > looks good. Thanks for backporting. > > Best regards, > Martin > > > > -----Original Message----- > > From: joserz at linux.ibm.com > > Sent: Freitag, 11. September 2020 03:21 > > To: Doerr, Martin > > Cc: hotspot-compiler-dev at openjdk.java.net; jdk-updates- > > dev at openjdk.java.net; HORIE at jp.ibm.com > > Subject: Re: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use > > new byte-reverse instructions > > > > Hello Martin, > > > > Here is my new webrev. > > https://cr.openjdk.java.net/~mhorie/8248190/jdk11u/webrev.00/ > > (Thanks again, Michi) > > > > 8<---------------------------------------------------------------------- > > Some evidences in a Power10 emulator: > > $ openjdk/jdk11u-dev/build/jdk/bin/java -Xcomp -XX:CompileThreshold=1 - > > XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions - > > XX:+PrintOptoAssembly -XX:-UseSIGTRAP -XX:+UseByteReverseInstructions > > ReverseBytes | grep 'BRD\|BRH\|BRW\|EXTSH' > > ... > > 054 BRW R17, R18 > > 070 BRD R14, R14 > > 080 BRH R14, R15 > > EXTSH R14, R14 > > 0a0 BRH R17, R15 > > > > $ openjdk/jdk11u-dev/build/jdk/bin/java -XX:+Verbose - > > XX:PowerArchitecturePPC64=10 -version > > dscr value was 0x10 > > Version: ppc64 fsqrt isel lxarxeh cmpb popcntb popcntw fcfids vand lqarx aes > > vpmsumb mfdscr vsx ldbrx stdbrx sha darn brw > > L1_data_cache_line_size=128 > > > > ContendedPaddingWidth 128 > > > > openjdk version "11.0.10-internal" 2021-01-19 > > OpenJDK Runtime Environment (slowdebug build 11.0.10-internal+0- > > adhoc.ziviani.jdk11u-dev) > > OpenJDK 64-Bit Server VM (slowdebug build 11.0.10-internal+0- > > adhoc.ziviani.jdk11u-dev, mixed mode) > > 8<---------------------------------------------------------------------- > > > > Thank you very much! > > > > Jose > > > > On Thu, Sep 10, 2020 at 09:54:04AM +0000, Doerr, Martin wrote: > > > Hi Jose, > > > > > > if manual adaptation/integration is needed we also need to review the > > backport webrev (except trivial differences like in copyright headers). > > > > > > Best regards, > > > Martin > > > > > > > > > > -----Original Message----- > > > > From: jdk-updates-dev On > > > > Behalf Of joserz at linux.ibm.com > > > > Sent: Mittwoch, 9. September 2020 18:38 > > > > To: hotspot-compiler-dev at openjdk.java.net; jdk-updates- > > > > dev at openjdk.java.net > > > > Subject: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use > > new > > > > byte-reverse instructions > > > > > > > > Hello team! > > > > > > > > I'd like to backport the following patchset to 11u. It doesn't apply > > perfectly > > > > due to some positional changes and a copyright update. > > > > Please, let me know if you prefer another webrev addressing this > > backport. > > > > > > > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.02/ > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > > > > > > Thank you very much, > > > > > > > > Jose R Ziviani From kvn at openjdk.java.net Tue Oct 13 16:05:10 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 13 Oct 2020 16:05:10 GMT Subject: RFR: 8254252: Generic arraycopy stub overwrites callee-save rdi register on 64-bit Windows [v2] In-Reply-To: References: Message-ID: On Tue, 13 Oct 2020 08:25:24 GMT, Tobias Hartmann wrote: >> Since [JDK-8241825](https://bugs.openjdk.java.net/browse/JDK-8241825), MacroAssembler::load_klass requires a temporary >> register to decode the klass pointer. In the generic arraycopy stub, rdi is used for that on 64-bit Windows because r9 >> is already used as an argument register: https://hg.openjdk.java.net/jdk/jdk/rev/0bb101fbeb10#l17.32 >> The problem is that rdi is callee-save [1] but not restored when returning from the stub. This leads to register >> corruption and more or less random crashes in the caller. >> Although JDK-8241825 is part of JDK 15, this was never a problem because we did not set the _WIN64 macro in adlc and as >> a result accidentally treated rdi (and rsi) as caller-save: >> https://github.com/openjdk/jdk/blob/b9873e18330b7e43ca47bc1c0655e7ab20828f7a/src/hotspot/cpu/x86/x86_64.ad#L89 >> Now that this got fixed as part of [JDK-8248238](https://bugs.openjdk.java.net/browse/JDK-8248238) [2], we hit the bug. >> >> Unfortunately, we are running out of caller-save registers on Windows. Since rcx, rdx, r8 and r9 are used for >> arguments, only rax, r10 and r11 remain which are already used as temporary registers and live throughout the stub >> code. I've decided to free up r11 by postponing the load of the array length which is only needed on Windows anyway >> (because only Windows passes the 5th argument on stack). Thanks, Tobias >> >> [1] https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 >> [2] https://openjdk.github.io/cr/?repo=jdk&pr=212&range=11#sdiff-8 > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Push/pop version I like this version. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/603 From thartmann at openjdk.java.net Tue Oct 13 16:08:10 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 13 Oct 2020 16:08:10 GMT Subject: RFR: 8254252: Generic arraycopy stub overwrites callee-save rdi register on 64-bit Windows [v2] In-Reply-To: References: Message-ID: On Tue, 13 Oct 2020 16:02:07 GMT, Vladimir Kozlov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Push/pop version > > I like this version. Okay, thanks for the review Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/603 From thartmann at openjdk.java.net Tue Oct 13 16:11:31 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 13 Oct 2020 16:11:31 GMT Subject: RFR: 8254252: Generic arraycopy stub overwrites callee-save rdi register on 64-bit Windows [v3] In-Reply-To: References: Message-ID: > Since [JDK-8241825](https://bugs.openjdk.java.net/browse/JDK-8241825), MacroAssembler::load_klass requires a temporary > register to decode the klass pointer. In the generic arraycopy stub, rdi is used for that on 64-bit Windows because r9 > is already used as an argument register: https://hg.openjdk.java.net/jdk/jdk/rev/0bb101fbeb10#l17.32 > The problem is that rdi is callee-save [1] but not restored when returning from the stub. This leads to register > corruption and more or less random crashes in the caller. > Although JDK-8241825 is part of JDK 15, this was never a problem because we did not set the _WIN64 macro in adlc and as > a result accidentally treated rdi (and rsi) as caller-save: > https://github.com/openjdk/jdk/blob/b9873e18330b7e43ca47bc1c0655e7ab20828f7a/src/hotspot/cpu/x86/x86_64.ad#L89 > Now that this got fixed as part of [JDK-8248238](https://bugs.openjdk.java.net/browse/JDK-8248238) [2], we hit the bug. > > Unfortunately, we are running out of caller-save registers on Windows. Since rcx, rdx, r8 and r9 are used for > arguments, only rax, r10 and r11 remain which are already used as temporary registers and live throughout the stub > code. I've decided to free up r11 by postponing the load of the array length which is only needed on Windows anyway > (because only Windows passes the 5th argument on stack). Thanks, Tobias > > [1] https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 > [2] https://openjdk.github.io/cr/?repo=jdk&pr=212&range=11#sdiff-8 Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Fixed indentation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/603/files - new: https://git.openjdk.java.net/jdk/pull/603/files/f52fcaec..b8b186c4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=603&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=603&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/603.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/603/head:pull/603 PR: https://git.openjdk.java.net/jdk/pull/603 From psandoz at openjdk.java.net Tue Oct 13 16:14:40 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Tue, 13 Oct 2020 16:14:40 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) [v3] In-Reply-To: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> Message-ID: <-wiRtZZKucOjqFnqeDjVm3B8BaThwGyDdt4aFo9t2-g=.2b4350f4-4704-4857-82e4-7e014898b2da@github.com> > This pull request is for integration of the Vector API. It was previously reviewed under conditions when mercurial was > used for the source code control system. Review threads can be found here (searching for issue number 8223347 in the > title): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-April/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/thread.html > > If mercurial was still being used the code would be pushed directly, once the CSR is approved. However, in this case a > pull request is required and needs explicit reviewer approval. Between the final review and this pull request no code > has changed, except for that related to merging. Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision: Fix related to merge ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/367/files - new: https://git.openjdk.java.net/jdk/pull/367/files/9cca17b8..d5acb4ff Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=367&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=367&range=01-02 Stats: 76 lines in 1 file changed: 0 ins; 0 del; 76 mod Patch: https://git.openjdk.java.net/jdk/pull/367.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/367/head:pull/367 PR: https://git.openjdk.java.net/jdk/pull/367 From martin.doerr at sap.com Tue Oct 13 16:17:04 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 13 Oct 2020 16:17:04 +0000 Subject: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use new byte-reverse instructions In-Reply-To: <20201013133331.GA30104@pacoca> References: <20200909163733.GA422344@pacoca> <20200911012040.GA518622@pacoca> <20201013133331.GA30104@pacoca> Message-ID: Hi Jose, pushed to jdk11u-dev. Best regards, Martin > -----Original Message----- > From: joserz at linux.ibm.com > Sent: Dienstag, 13. Oktober 2020 15:34 > To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net; jdk-updates- > dev at openjdk.java.net; HORIE at jp.ibm.com > Subject: Re: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use > new byte-reverse instructions > > Hello Martin!! > > Ticket is updated: > https://bugs.openjdk.java.net/browse/JDK-8248190 > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.02/ > > Thank you!! > > On Fri, Sep 11, 2020 at 12:26:09PM +0000, Doerr, Martin wrote: > > Hi Jose, > > > > looks good. Thanks for backporting. > > > > Best regards, > > Martin > > > > > > > -----Original Message----- > > > From: joserz at linux.ibm.com > > > Sent: Freitag, 11. September 2020 03:21 > > > To: Doerr, Martin > > > Cc: hotspot-compiler-dev at openjdk.java.net; jdk-updates- > > > dev at openjdk.java.net; HORIE at jp.ibm.com > > > Subject: Re: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use > > > new byte-reverse instructions > > > > > > Hello Martin, > > > > > > Here is my new webrev. > > > https://cr.openjdk.java.net/~mhorie/8248190/jdk11u/webrev.00/ > > > (Thanks again, Michi) > > > > > > 8<---------------------------------------------------------------------- > > > Some evidences in a Power10 emulator: > > > $ openjdk/jdk11u-dev/build/jdk/bin/java -Xcomp - > XX:CompileThreshold=1 - > > > XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions - > > > XX:+PrintOptoAssembly -XX:-UseSIGTRAP - > XX:+UseByteReverseInstructions > > > ReverseBytes | grep 'BRD\|BRH\|BRW\|EXTSH' > > > ... > > > 054 BRW R17, R18 > > > 070 BRD R14, R14 > > > 080 BRH R14, R15 > > > EXTSH R14, R14 > > > 0a0 BRH R17, R15 > > > > > > $ openjdk/jdk11u-dev/build/jdk/bin/java -XX:+Verbose - > > > XX:PowerArchitecturePPC64=10 -version > > > dscr value was 0x10 > > > Version: ppc64 fsqrt isel lxarxeh cmpb popcntb popcntw fcfids vand lqarx > aes > > > vpmsumb mfdscr vsx ldbrx stdbrx sha darn brw > > > L1_data_cache_line_size=128 > > > > > > ContendedPaddingWidth 128 > > > > > > openjdk version "11.0.10-internal" 2021-01-19 > > > OpenJDK Runtime Environment (slowdebug build 11.0.10-internal+0- > > > adhoc.ziviani.jdk11u-dev) > > > OpenJDK 64-Bit Server VM (slowdebug build 11.0.10-internal+0- > > > adhoc.ziviani.jdk11u-dev, mixed mode) > > > 8<---------------------------------------------------------------------- > > > > > > Thank you very much! > > > > > > Jose > > > > > > On Thu, Sep 10, 2020 at 09:54:04AM +0000, Doerr, Martin wrote: > > > > Hi Jose, > > > > > > > > if manual adaptation/integration is needed we also need to review the > > > backport webrev (except trivial differences like in copyright headers). > > > > > > > > Best regards, > > > > Martin > > > > > > > > > > > > > -----Original Message----- > > > > > From: jdk-updates-dev > On > > > > > Behalf Of joserz at linux.ibm.com > > > > > Sent: Mittwoch, 9. September 2020 18:38 > > > > > To: hotspot-compiler-dev at openjdk.java.net; jdk-updates- > > > > > dev at openjdk.java.net > > > > > Subject: [11u] RFR(M): 8248190: PPC: Enable Power10 system and use > > > new > > > > > byte-reverse instructions > > > > > > > > > > Hello team! > > > > > > > > > > I'd like to backport the following patchset to 11u. It doesn't apply > > > perfectly > > > > > due to some positional changes and a copyright update. > > > > > Please, let me know if you prefer another webrev addressing this > > > backport. > > > > > > > > > > Webrev: https://cr.openjdk.java.net/~mhorie/8248190/webrev.02/ > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248190 > > > > > > > > > > Thank you very much, > > > > > > > > > > Jose R Ziviani From kvn at openjdk.java.net Tue Oct 13 16:21:15 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 13 Oct 2020 16:21:15 GMT Subject: RFR: 8254602: compiler/debug/TestStressCM.java failed with "RuntimeException: got the same optimization stats for different seeds: expected 45" In-Reply-To: References: Message-ID: On Tue, 13 Oct 2020 08:23:42 GMT, Roberto Casta?eda Lozano wrote: > Remove test assertion checking that different random seeds lead to different code motion decisions. This was the case > for the specific pair of random seeds, IR fed to code motion, and target platforms tested originally; but does not need > to hold in general. Remove similar test assertion in IGVN randomization test case. Re-enable the test case. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/626 From kvn at openjdk.java.net Tue Oct 13 16:22:09 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 13 Oct 2020 16:22:09 GMT Subject: RFR: 8254575: C2: Clean up unused TRACK_PHI_INPUTS assertion code In-Reply-To: References: Message-ID: On Mon, 12 Oct 2020 11:28:00 GMT, Roberto Casta?eda Lozano wrote: > Remove assertion code that was disabled in all build configurations. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/606 From psandoz at openjdk.java.net Tue Oct 13 17:34:37 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Tue, 13 Oct 2020 17:34:37 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) [v4] In-Reply-To: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> Message-ID: > This pull request is for integration of the Vector API. It was previously reviewed under conditions when mercurial was > used for the source code control system. Review threads can be found here (searching for issue number 8223347 in the > title): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-April/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/thread.html > > If mercurial was still being used the code would be pushed directly, once the CSR is approved. However, in this case a > pull request is required and needs explicit reviewer approval. Between the final review and this pull request no code > has changed, except for that related to merging. Paul Sandoz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge master - Fix related to merge - HotspotIntrinsicCandidate to IntrinsicCandidate - Merge master - Fix permissions - Fix permissions - Merge master - Vector API new files - Integration of Vector API (Incubator) ------------- Changes: https://git.openjdk.java.net/jdk/pull/367/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=367&range=03 Stats: 295107 lines in 336 files changed: 292957 ins; 1062 del; 1088 mod Patch: https://git.openjdk.java.net/jdk/pull/367.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/367/head:pull/367 PR: https://git.openjdk.java.net/jdk/pull/367 From jbhateja at openjdk.java.net Tue Oct 13 17:54:22 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 13 Oct 2020 17:54:22 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v2] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 > bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized > instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag > ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. 4) Based on the > perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types > (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions. ------------- Changes: https://git.openjdk.java.net/jdk/pull/302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=01 Stats: 516 lines in 23 files changed: 495 ins; 0 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Tue Oct 13 18:03:27 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 13 Oct 2020 18:03:27 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v3] In-Reply-To: References: Message-ID: <94qadtiTzSkdsJAc_8IWrLxpBvmfiBXMf_W9Z965P80=.9a59a5db-2209-4007-94bb-16ccd8ff0b77@github.com> > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 > bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized > instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag > ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. 4) Based on the > perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types > (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Replacing explicit type checks with existing type checking routines ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/302/files - new: https://git.openjdk.java.net/jdk/pull/302/files/9ab77283..2679fe66 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From github.com+69653380+katyapav at openjdk.java.net Tue Oct 13 22:13:18 2020 From: github.com+69653380+katyapav at openjdk.java.net (Ekaterina Pavlova) Date: Tue, 13 Oct 2020 22:13:18 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) [v4] In-Reply-To: References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> Message-ID: <1Nola-xPBRFEk3kdLVEoJ1sO8U4OpFldV2sq_Ta4Jxs=.075fcfb4-ebba-4351-9265-e41066b5da96@github.com> On Mon, 12 Oct 2020 12:56:10 GMT, Erik Joelsson wrote: >> Paul Sandoz has updated the pull request with a new target base due to a merge or a rebase. The pull request now >> contains ten commits: >> - Merge master >> - Fix related to merge >> - HotspotIntrinsicCandidate to IntrinsicCandidate >> - Merge master >> - Fix permissions >> - Fix permissions >> - Merge master >> - Vector API new files >> - Integration of Vector API (Incubator) > > Build changes look good. There are several gc tests crashed in panama-vector tier3 testing which seems are not observed in openjdk repo. The crashes look like: # assert(oopDesc::is_oop(obj)) failed: not an oop: 0xfffffffffffffff1 # # JRE version: Java(TM) SE Runtime Environment (16.0+3) (fastdebug build 16-panama+3-216) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-panama+3-216, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xd8ef94] HandleArea::allocate_handle(oop)+0x144 and the issue is actually tracked by JDK-8233199. This issue needs to be at least analyzed before integrating Vector API. ------------- PR: https://git.openjdk.java.net/jdk/pull/367 From sviswanathan at openjdk.java.net Wed Oct 14 00:37:22 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 14 Oct 2020 00:37:22 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) [v4] In-Reply-To: <1Nola-xPBRFEk3kdLVEoJ1sO8U4OpFldV2sq_Ta4Jxs=.075fcfb4-ebba-4351-9265-e41066b5da96@github.com> References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> <1Nola-xPBRFEk3kdLVEoJ1sO8U4OpFldV2sq_Ta4Jxs=.075fcfb4-ebba-4351-9265-e41066b5da96@github.com> Message-ID: On Tue, 13 Oct 2020 21:29:52 GMT, Ekaterina Pavlova wrote: >> Build changes look good. > > There are several gc tests crashed in panama-vector tier3 testing which seems are not observed in openjdk repo. > The crashes look like: > # assert(oopDesc::is_oop(obj)) failed: not an oop: 0xfffffffffffffff1 > # > # JRE version: Java(TM) SE Runtime Environment (16.0+3) (fastdebug build 16-panama+3-216) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-panama+3-216, mixed mode, sharing, tiered, compressed oops, > g1 gc, linux-amd64) # Problematic frame: > # V [libjvm.so+0xd8ef94] HandleArea::allocate_handle(oop)+0x144 > > and the issue is actually tracked by JDK-8233199. > > This issue needs to be at least analyzed before integrating Vector API. @katyapav Is the failure observed on vector-unstable branch of panama-vector? The code in this pull request is from vector-unstable branch. The bug report https://bugs.openjdk.java.net/browse/JDK-8233199 refers to repo-valhalla and not panama-vector:vector-unstable. @PaulSandoz is doing final testing of the pull request today before integration tomorrow hopefully. ------------- PR: https://git.openjdk.java.net/jdk/pull/367 From github.com+69653380+katyapav at openjdk.java.net Wed Oct 14 00:50:25 2020 From: github.com+69653380+katyapav at openjdk.java.net (Ekaterina Pavlova) Date: Wed, 14 Oct 2020 00:50:25 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) [v4] In-Reply-To: References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> <1Nola-xPBRFEk3kdLVEoJ1sO8U4OpFldV2sq_Ta4Jxs=.075fcfb4-ebba-4351-9265-e41066b5da96@github.com> Message-ID: <57GgbYK9zqtp_hlgSwgHG-vN0th0LEmLksSntjQ7mW8=.3c2a2d5f-e407-41ff-a2b1-3d89043144a2@github.com> On Wed, 14 Oct 2020 00:34:04 GMT, Sandhya Viswanathan wrote: >> There are several gc tests crashed in panama-vector tier3 testing which seems are not observed in openjdk repo. >> The crashes look like: >> # assert(oopDesc::is_oop(obj)) failed: not an oop: 0xfffffffffffffff1 >> # >> # JRE version: Java(TM) SE Runtime Environment (16.0+3) (fastdebug build 16-panama+3-216) >> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-panama+3-216, mixed mode, sharing, tiered, compressed oops, >> g1 gc, linux-amd64) # Problematic frame: >> # V [libjvm.so+0xd8ef94] HandleArea::allocate_handle(oop)+0x144 >> >> and the issue is actually tracked by JDK-8233199. >> >> This issue needs to be at least analyzed before integrating Vector API. > > @katyapav Is the failure observed on vector-unstable branch of panama-vector? > The code in this pull request is from vector-unstable branch. > The bug report https://bugs.openjdk.java.net/browse/JDK-8233199 refers to repo-valhalla and not > panama-vector:vector-unstable. @PaulSandoz is doing final testing of the pull request today before integration tomorrow > hopefully. @sviswa7 you are right, the failure is observed on vector-unstable branch of panama-vector. I referred to JDK-8233199 because it seems both panama-vector and valhalla-repo have the same issue/crash. @PaulSandoz also mentioned that panama-vector was not in sync with master and this is perhaps the issue is in vector-unstable. He said that he tested the PR separately and didn't observe this issue in the PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/367 From github.com+8792647+robcasloz at openjdk.java.net Wed Oct 14 06:42:12 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Oct 2020 06:42:12 GMT Subject: RFR: 8254602: compiler/debug/TestStressCM.java failed with "RuntimeException: got the same optimization stats for different seeds: expected 45" In-Reply-To: References: Message-ID: On Tue, 13 Oct 2020 16:18:39 GMT, Vladimir Kozlov wrote: >> Remove test assertion checking that different random seeds lead to different code motion decisions. This was the case >> for the specific pair of random seeds, IR fed to code motion, and target platforms tested originally; but does not need >> to hold in general. Remove similar test assertion in IGVN randomization test case. Re-enable the test case. > > Good. Thanks for the reviews Aleksey and Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/626 From github.com+8792647+robcasloz at openjdk.java.net Wed Oct 14 06:44:16 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Oct 2020 06:44:16 GMT Subject: RFR: 8254575: C2: Clean up unused TRACK_PHI_INPUTS assertion code In-Reply-To: References: Message-ID: On Tue, 13 Oct 2020 16:19:41 GMT, Vladimir Kozlov wrote: >> Remove assertion code that was disabled in all build configurations. > > Good. Thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/606 From chagedorn at openjdk.java.net Wed Oct 14 07:04:12 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 14 Oct 2020 07:04:12 GMT Subject: RFR: 8254252: Generic arraycopy stub overwrites callee-save rdi register on 64-bit Windows [v3] In-Reply-To: References: Message-ID: On Tue, 13 Oct 2020 16:11:31 GMT, Tobias Hartmann wrote: >> Since [JDK-8241825](https://bugs.openjdk.java.net/browse/JDK-8241825), MacroAssembler::load_klass requires a temporary >> register to decode the klass pointer. In the generic arraycopy stub, rdi is used for that on 64-bit Windows because r9 >> is already used as an argument register: https://hg.openjdk.java.net/jdk/jdk/rev/0bb101fbeb10#l17.32 >> The problem is that rdi is callee-save [1] but not restored when returning from the stub. This leads to register >> corruption and more or less random crashes in the caller. >> Although JDK-8241825 is part of JDK 15, this was never a problem because we did not set the _WIN64 macro in adlc and as >> a result accidentally treated rdi (and rsi) as caller-save: >> https://github.com/openjdk/jdk/blob/b9873e18330b7e43ca47bc1c0655e7ab20828f7a/src/hotspot/cpu/x86/x86_64.ad#L89 >> Now that this got fixed as part of [JDK-8248238](https://bugs.openjdk.java.net/browse/JDK-8248238) [2], we hit the bug. >> >> Unfortunately, we are running out of caller-save registers on Windows. Since rcx, rdx, r8 and r9 are used for >> arguments, only rax, r10 and r11 remain which are already used as temporary registers and live throughout the stub >> code. I've decided to free up r11 by postponing the load of the array length which is only needed on Windows anyway >> (because only Windows passes the 5th argument on stack). Thanks, Tobias >> >> [1] https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 >> [2] https://openjdk.github.io/cr/?repo=jdk&pr=212&range=11#sdiff-8 > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Fixed indentation Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/603 From thartmann at openjdk.java.net Wed Oct 14 07:29:13 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 14 Oct 2020 07:29:13 GMT Subject: RFR: 8254252: Generic arraycopy stub overwrites callee-save rdi register on 64-bit Windows [v3] In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 07:01:38 GMT, Christian Hagedorn wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed indentation > > Looks good. Thanks Christian! ------------- PR: https://git.openjdk.java.net/jdk/pull/603 From thartmann at openjdk.java.net Wed Oct 14 07:29:14 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 14 Oct 2020 07:29:14 GMT Subject: Integrated: 8254252: Generic arraycopy stub overwrites callee-save rdi register on 64-bit Windows In-Reply-To: References: Message-ID: On Mon, 12 Oct 2020 10:32:48 GMT, Tobias Hartmann wrote: > Since [JDK-8241825](https://bugs.openjdk.java.net/browse/JDK-8241825), MacroAssembler::load_klass requires a temporary > register to decode the klass pointer. In the generic arraycopy stub, rdi is used for that on 64-bit Windows because r9 > is already used as an argument register: https://hg.openjdk.java.net/jdk/jdk/rev/0bb101fbeb10#l17.32 > The problem is that rdi is callee-save [1] but not restored when returning from the stub. This leads to register > corruption and more or less random crashes in the caller. > Although JDK-8241825 is part of JDK 15, this was never a problem because we did not set the _WIN64 macro in adlc and as > a result accidentally treated rdi (and rsi) as caller-save: > https://github.com/openjdk/jdk/blob/b9873e18330b7e43ca47bc1c0655e7ab20828f7a/src/hotspot/cpu/x86/x86_64.ad#L89 > Now that this got fixed as part of [JDK-8248238](https://bugs.openjdk.java.net/browse/JDK-8248238) [2], we hit the bug. > > Unfortunately, we are running out of caller-save registers on Windows. Since rcx, rdx, r8 and r9 are used for > arguments, only rax, r10 and r11 remain which are already used as temporary registers and live throughout the stub > code. I've decided to free up r11 by postponing the load of the array length which is only needed on Windows anyway > (because only Windows passes the 5th argument on stack). Thanks, Tobias > > [1] https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 > [2] https://openjdk.github.io/cr/?repo=jdk&pr=212&range=11#sdiff-8 This pull request has now been integrated. Changeset: 31d9b7fe Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/31d9b7fe Stats: 21 lines in 1 file changed: 18 ins; 1 del; 2 mod 8254252: Generic arraycopy stub overwrites callee-save rdi register on 64-bit Windows Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/603 From roland at openjdk.java.net Wed Oct 14 08:00:16 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 14 Oct 2020 08:00:16 GMT Subject: RFR: 8254734: "dead loop detected" assert failure with patch from 8223051 Message-ID: compiler/c2/TestDeadDataLoopIGVN.java ran with -server -Xcomp -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation and the patch from 8223051 (long counted loops, not integrated yet) fails with: assert(no_dead_loop) failed: dead loop detected I can reproduce the failure with only this change from 8223051: diff --git a/src/hotspot/share/opto/callnode.cpp b/src/hotspot/share/opto/callnode.cpp index 268cec7732c..295cb4ecaf9 100644 --- a/src/hotspot/share/opto/callnode.cpp +++ b/src/hotspot/share/opto/callnode.cpp @@ -1159,7 +1159,7 @@ Node* SafePointNode::Identity(PhaseGVN* phase) { if( in(TypeFunc::Control)->is_SafePoint() ) return in(TypeFunc::Control); - if( in(0)->is_Proj() ) { + if (in(0)->is_Proj() && !phase->C->major_progress()) { Node *n0 = in(0)->in(0); // Check if he is a call projection (except Leaf Call) if( n0->is_Catch() ) { diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp index 4b88c379dea..baf2bf9bacc 100644 --- a/src/hotspot/share/opto/parse1.cpp +++ b/src/hotspot/share/opto/parse1.cpp @@ -2254,23 +2254,7 @@ void Parse::return_current(Node* value) { //------------------------------add_safepoint---------------------------------- void Parse::add_safepoint() { - // See if we can avoid this safepoint. No need for a SafePoint immediately - // after a Call (except Leaf Call) or another SafePoint. - Node *proj = control(); uint parms = TypeFunc::Parms+1; - if( proj->is_Proj() ) { - Node *n0 = proj->in(0); - if( n0->is_Catch() ) { - n0 = n0->in(0)->in(0); - assert( n0->is_Call(), "expect a call here" ); - } - if( n0->is_Call() ) { - if( n0->as_Call()->guaranteed_safepoint() ) - return; - } else if( n0->is_SafePoint() && n0->req() >= parms ) { - return; - } - } // Clear out dead values from the debug info. kill_dead_locals(); so it's unrelated to the long counted loops transformation itself. At IGVN time, a dead loop is optimized. The loop contains the following subgraph: (LoadI .. (AddP (CastPP .. (Phi (Proj (CallStaticJava .. (AddI The projection is on the backedge of the phi. The AddI points back to the LoadI. The CallStaticJava is a boxing method call. The LoadI is a boxed value load. The phi is optimized out as the loop is unreachable: (LoadI .. (AddP (Proj (CallStaticJava .. (AddI The LoadI is then optimized by code MemNode::can_see_stored_value(): // Load boxed value from result of valueOf() call is input parameter. if (this->is_Load() && ld_adr->is_AddP() && (tp != NULL) && tp->is_ptr_to_boxed_value()) { intptr_t ignore = 0; Node* base = AddPNode::Ideal_base_and_offset(ld_adr, phase, ignore); BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2(); base = bs->step_over_gc_barrier(base); if (base != NULL && base->is_Proj() && base->as_Proj()->_con == TypeFunc::Parms && base->in(0)->is_CallStaticJava() && base->in(0)->as_CallStaticJava()->is_boxing_method()) { return base->in(0)->in(TypeFunc::Parms); } } to the AddI. Because the AddI has the LoadI as input we end up with a dead loop. I propose extending the dead loop safe logic to take the pattern with a boxing method load into account. ------------- Commit messages: - fix Changes: https://git.openjdk.java.net/jdk/pull/649/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=649&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254734 Stats: 28 lines in 2 files changed: 23 ins; 4 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/649.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/649/head:pull/649 PR: https://git.openjdk.java.net/jdk/pull/649 From github.com+8792647+robcasloz at openjdk.java.net Wed Oct 14 08:10:15 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Oct 2020 08:10:15 GMT Subject: Integrated: 8254575: C2: Clean up unused TRACK_PHI_INPUTS assertion code In-Reply-To: References: Message-ID: On Mon, 12 Oct 2020 11:28:00 GMT, Roberto Casta?eda Lozano wrote: > Remove assertion code that was disabled in all build configurations. This pull request has now been integrated. Changeset: 9fe9b24b Author: Roberto Casta?eda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/9fe9b24b Stats: 23 lines in 1 file changed: 0 ins; 23 del; 0 mod 8254575: C2: Clean up unused TRACK_PHI_INPUTS assertion code Remove assertion code that was disabled in all build configurations. Co-authored-by: Vladimir Ivanov Reviewed-by: vlivanov, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/606 From github.com+8792647+robcasloz at openjdk.java.net Wed Oct 14 08:11:11 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Oct 2020 08:11:11 GMT Subject: Integrated: 8254602: compiler/debug/TestStressCM.java failed with "RuntimeException: got the same optimization stats for different seeds: expected 45" In-Reply-To: References: Message-ID: On Tue, 13 Oct 2020 08:23:42 GMT, Roberto Casta?eda Lozano wrote: > Remove test assertion checking that different random seeds lead to different code motion decisions. This was the case > for the specific pair of random seeds, IR fed to code motion, and target platforms tested originally; but does not need > to hold in general. Remove similar test assertion in IGVN randomization test case. Re-enable the test case. This pull request has now been integrated. Changeset: b509e31e Author: Roberto Casta?eda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/b509e31e Stats: 14 lines in 3 files changed: 0 ins; 9 del; 5 mod 8254602: compiler/debug/TestStressCM.java failed with "RuntimeException: got the same optimization stats for different seeds: expected 45" Remove test assertion checking that different random seeds lead to different code motion decisions. This was the case for the specific pair of random seeds, IR fed to code motion, and target platforms tested originally; but does not need to hold in general. Remove similar test assertion in IGVN randomization test case. Re-enable the test case. Reviewed-by: shade, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/626 From chagedorn at openjdk.java.net Wed Oct 14 10:05:09 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 14 Oct 2020 10:05:09 GMT Subject: RFR: 8254734: "dead loop detected" assert failure with patch from 8223051 In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 07:52:58 GMT, Roland Westrelin wrote: > compiler/c2/TestDeadDataLoopIGVN.java ran with -server -Xcomp > -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 > -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation and > the patch from 8223051 (long counted loops, not integrated yet) fails > with: > > assert(no_dead_loop) failed: dead loop detected > > I can reproduce the failure with only this change from 8223051: > diff --git a/src/hotspot/share/opto/callnode.cpp b/src/hotspot/share/opto/callnode.cpp > index 268cec7732c..295cb4ecaf9 100644 > --- a/src/hotspot/share/opto/callnode.cpp > +++ b/src/hotspot/share/opto/callnode.cpp > @@ -1159,7 +1159,7 @@ Node* SafePointNode::Identity(PhaseGVN* phase) { > if( in(TypeFunc::Control)->is_SafePoint() ) > return in(TypeFunc::Control); > > - if( in(0)->is_Proj() ) { > + if (in(0)->is_Proj() && !phase->C->major_progress()) { > Node *n0 = in(0)->in(0); > // Check if he is a call projection (except Leaf Call) > if( n0->is_Catch() ) { > diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp > index 4b88c379dea..baf2bf9bacc 100644 > --- a/src/hotspot/share/opto/parse1.cpp > +++ b/src/hotspot/share/opto/parse1.cpp > @@ -2254,23 +2254,7 @@ void Parse::return_current(Node* value) { > > //------------------------------add_safepoint---------------------------------- > void Parse::add_safepoint() { > - // See if we can avoid this safepoint. No need for a SafePoint immediately > - // after a Call (except Leaf Call) or another SafePoint. > - Node *proj = control(); > uint parms = TypeFunc::Parms+1; > - if( proj->is_Proj() ) { > - Node *n0 = proj->in(0); > - if( n0->is_Catch() ) { > - n0 = n0->in(0)->in(0); > - assert( n0->is_Call(), "expect a call here" ); > - } > - if( n0->is_Call() ) { > - if( n0->as_Call()->guaranteed_safepoint() ) > - return; > - } else if( n0->is_SafePoint() && n0->req() >= parms ) { > - return; > - } > - } > > // Clear out dead values from the debug info. > kill_dead_locals(); > so it's unrelated to the long counted loops transformation itself. > > At IGVN time, a dead loop is optimized. The loop contains the > following subgraph: > > (LoadI .. (AddP (CastPP .. (Phi (Proj (CallStaticJava .. (AddI > > The projection is on the backedge of the phi. The AddI points back to > the LoadI. The CallStaticJava is a boxing method call. The LoadI is a > boxed value load. > > The phi is optimized out as the loop is unreachable: > > (LoadI .. (AddP (Proj (CallStaticJava .. (AddI > > The LoadI is then optimized by code MemNode::can_see_stored_value(): > // Load boxed value from result of valueOf() call is input parameter. > if (this->is_Load() && ld_adr->is_AddP() && > (tp != NULL) && tp->is_ptr_to_boxed_value()) { > intptr_t ignore = 0; > Node* base = AddPNode::Ideal_base_and_offset(ld_adr, phase, ignore); > BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2(); > base = bs->step_over_gc_barrier(base); > if (base != NULL && base->is_Proj() && > base->as_Proj()->_con == TypeFunc::Parms && > base->in(0)->is_CallStaticJava() && > base->in(0)->as_CallStaticJava()->is_boxing_method()) { > return base->in(0)->in(TypeFunc::Parms); > } > } > to the AddI. Because the AddI has the LoadI as input we end up with a > dead loop. I propose extending the dead loop safe logic to take the > pattern with a boxing method load into account. That's a reasonable fix. I was thinking about something similar before when I was fixing JDK-8251544. Looks good to me. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/649 From neliasso at openjdk.java.net Wed Oct 14 12:06:19 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 14 Oct 2020 12:06:19 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v6] In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: On Mon, 12 Oct 2020 11:17:25 GMT, Jason Tatton wrote: >> This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for >> x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it >> I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as >> code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and >> java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 >> >> Details of testing: >> ============ >> I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note >> that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the >> input String. Hence the test has been designed to cover all these cases. In summary they are: >> - A ?short? string of < 16 characters. >> - A SIMD String of 16 ? 31 characters. >> - A AVX2 SIMD String of 32 characters+. >> >> Hardware used for testing: >> ----------------------------- >> >> - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). >> - AWS Graviton 2 (ARM 64 processor). >> >> I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. >> >> Possible future enhancements: >> ==================== >> For the x86 implementation there may be two further improvements we can make in order to improve performance of both >> the StringUTF16 and StringLatin1 indexOf(char) intrinsics: >> 1. Make use of AVX-512 instructions. >> 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD >> instructions instead of a loop. >> Benchmark results: >> ============ >> **Without** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ? 182.581 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ? 435.933 | ns/op | >> >> >> **With** the new StringLatin1 indexOf(char) intrinsic: >> >> | Benchmark | Mode | Cnt | Score | Error | Units | >> | ------------- | ------------- |------------- |------------- |------------- |------------- | >> | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ? 407.716 | ns/op | >> | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ? 167.306 | ns/op | >> >> >> The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 >> indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when >> running on ARM. > > Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: > > Added missing copyright notices Marked as reviewed by neliasso (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From neliasso at openjdk.java.net Wed Oct 14 12:11:11 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 14 Oct 2020 12:11:11 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: References: <_0-zfIDPieC0Xnc17GaSSsS7Sz9EEUrfjRyqWDtphfU=.298bacde-f330-486a-8bea-03ff1523d00c@github.com> Message-ID: On Thu, 8 Oct 2020 17:29:27 GMT, Jatin Bhateja wrote: >>> Can you explain why 32 bytes are such a distinct performance cliff? >>> >>> Is there any performance difference between doing a single 64 bytes masked copy or two 32 bytes? >> >> Hi Nils, >> Copy for sizes <= 32 bytes can be done using one YMM register, AVX-512 vector length extension allows masked >> instructions to operate on YMM and XMM registers. Using newly added flag -XX:ArrayCopyPartialInlineSize=64 one can >> perform in-lining up to 64 bytes but since it will use a ZMM register CPU will operate at a lower frequency but it >> could still give better performance depending on the application. A single 64 byte masked copy may have a performance >> hit if for majority of the application runtime, CPU operates at highest frequency. There is a switchover penalty from >> higher frequency level to lower frequency level along with some hysteresis which forces subsequent instructions to >> operate a lower frequency for some cycles. Current implementation has been kept simple to avoid emitting too many >> instruction at call site considering arraycopy is a very high frequency operation. > > Hi @neliasso , @vnkozlov , kindly let me know your review comments. Hi Jatin, I'm ready to approve it, but I would like to kick it through some performance testing first. Best regards, Nils Eliasson ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From github.com+70893615+jasontatton-aws at openjdk.java.net Wed Oct 14 13:01:24 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Wed, 14 Oct 2020 13:01:24 GMT Subject: Integrated: 8173585: Intrinsify StringLatin1.indexOf(char) In-Reply-To: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: On Tue, 8 Sep 2020 11:59:36 GMT, Jason Tatton wrote: > This is an implementation of the indexOf(char) intrinsic for StringLatin1 (1 byte encoded Strings). It is provided for > x86 and ARM64. The implementation is greatly inspired by the indexOf(char) intrinsic for StringUTF16. To incorporate it > I had to make a small change to StringLatin1.java (refactor of functionality to intrisified private method) as well as > code for C2. Submitted to: hotspot-compiler-dev and core-libs-dev as this patch contains a change to hotspot and > java/lang/StringLatin1.java https://bugs.openjdk.java.net/browse/JDK-8173585 > > Details of testing: > ============ > I have created a jtreg test ?compiler/intrinsics/string/TestStringLatin1IndexOfChar? to cover this new intrinsic. Note > that, particularly for the x86 implementation of the intrinsic, the code path taken is dependent upon the length of the > input String. Hence the test has been designed to cover all these cases. In summary they are: > - A ?short? string of < 16 characters. > - A SIMD String of 16 ? 31 characters. > - A AVX2 SIMD String of 32 characters+. > > Hardware used for testing: > ----------------------------- > > - Intel Xeon CPU E5-2680 (JVM did not recognize this as having AVX2 support) ? Intel i7 processor (with AVX2 support). > - AWS Graviton 2 (ARM 64 processor). > > I also ran; ?run-test-tier1? and ?run-test-tier2? for: x86_64 and aarch64. > > Possible future enhancements: > ==================== > For the x86 implementation there may be two further improvements we can make in order to improve performance of both > the StringUTF16 and StringLatin1 indexOf(char) intrinsics: > 1. Make use of AVX-512 instructions. > 2. For ?short? Strings (see below), I think it may be possible to modify the existing algorithm to still use SSE SIMD > instructions instead of a loop. > Benchmark results: > ============ > **Without** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **26,389.129** | ? 182.581 | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 17,885.383 | ? 435.933 | ns/op | > > > **With** the new StringLatin1 indexOf(char) intrinsic: > > | Benchmark | Mode | Cnt | Score | Error | Units | > | ------------- | ------------- |------------- |------------- |------------- |------------- | > | IndexOfBenchmark.latin1_mixed_char | avgt | 5 | **17,875.185** | ? 407.716 | ns/op | > | IndexOfBenchmark.utf16_mixed_char | avgt | 5 | 18,292.802 | ? 167.306 | ns/op | > > > The objective of the patch is to bring the performance of StringLatin1 indexOf(char) in line with StringUTF16 > indexOf(char) for x86 and ARM64. We can see above that this has been achieved. Similar results were obtained when > running on ARM. This pull request has now been integrated. Changeset: f71e8a61 Author: Jason Tatton (AWS) Committer: Paul Hohensee URL: https://git.openjdk.java.net/jdk/commit/f71e8a61 Stats: 613 lines in 15 files changed: 597 ins; 0 del; 16 mod 8173585: Intrinsify StringLatin1.indexOf(char) Reviewed-by: neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From rriggs at openjdk.java.net Wed Oct 14 13:21:21 2020 From: rriggs at openjdk.java.net (Roger Riggs) Date: Wed, 14 Oct 2020 13:21:21 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v6] In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: On Wed, 14 Oct 2020 12:03:42 GMT, Nils Eliasson wrote: >> Jason Tatton has updated the pull request incrementally with one additional commit since the last revision: >> >> Added missing copyright notices > > Marked as reviewed by neliasso (Reviewer). Due to the requirement for multiple reviewers, I had been waiting to add my review of the Core-Libs files until the HotSpot reviewers had approved! I see only one reviewer credited in the commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From shade at openjdk.java.net Wed Oct 14 14:35:17 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 14 Oct 2020 14:35:17 GMT Subject: RFR: 8254769: Remove unimplemented BCEscapeAnalyzer::{add_dependence, propagate_dependencies} Message-ID: These methods are not implemented, and there were no definitions since the initial load. Can be removed. Testing: - [x] Linux x86_64 build - [x] Text searches for `add_dependence` and `propagate_dependencies` in `src/hotspot` ------------- Commit messages: - 8254769: Remove unimplemented BCEscapeAnalyzer::{add_dependence, propagate_dependencies} Changes: https://git.openjdk.java.net/jdk/pull/656/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=656&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254769 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/656.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/656/head:pull/656 PR: https://git.openjdk.java.net/jdk/pull/656 From shade at openjdk.java.net Wed Oct 14 14:43:22 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 14 Oct 2020 14:43:22 GMT Subject: RFR: 8254771: Remove unimplemented ciSignature::get_all_klasses Message-ID: This method is not implemented, and there was no definition since the initial load. Can be removed. Testing: - [x] Linux x86_64 build - [x] Text search for `get_all_klasses` in `src/hotspot`; there is `JdkJfrEvent::get_all_klasses` that looks unrelated. ------------- Commit messages: - 8254771: Remove unimplemented ciSignature::get_all_klasses Changes: https://git.openjdk.java.net/jdk/pull/657/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=657&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254771 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/657.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/657/head:pull/657 PR: https://git.openjdk.java.net/jdk/pull/657 From shade at openjdk.java.net Wed Oct 14 14:53:23 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 14 Oct 2020 14:53:23 GMT Subject: RFR: 8254773: Remove unimplemented ciReplay::is_loaded(Klass* klass) Message-ID: Both overloads for Klass* and Method* were added by JDK-6830717. But Klass* overload was never implemented: http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/bd7a7ce2e264#l32.934 Testing: - [x] Linux x86_64 build ------------- Commit messages: - 8254773: Remove unimplemented ciReplay::is_loaded(Klass* klass) Changes: https://git.openjdk.java.net/jdk/pull/658/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=658&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254773 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/658.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/658/head:pull/658 PR: https://git.openjdk.java.net/jdk/pull/658 From thartmann at openjdk.java.net Wed Oct 14 14:57:19 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 14 Oct 2020 14:57:19 GMT Subject: RFR: 8254771: Remove unimplemented ciSignature::get_all_klasses In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 14:36:59 GMT, Aleksey Shipilev wrote: > This method is not implemented, and there was no definition since the initial load. Can be removed. > > Testing: > - [x] Linux x86_64 build > - [x] Text search for `get_all_klasses` in `src/hotspot`; there is `JdkJfrEvent::get_all_klasses` that looks unrelated. Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/657 From thartmann at openjdk.java.net Wed Oct 14 14:58:15 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 14 Oct 2020 14:58:15 GMT Subject: RFR: 8254773: Remove unimplemented ciReplay::is_loaded(Klass* klass) In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 14:44:44 GMT, Aleksey Shipilev wrote: > Both overloads for Klass* and Method* were added by JDK-6830717. But Klass* overload was never implemented: > http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/bd7a7ce2e264#l32.934 > > Testing: > - [x] Linux x86_64 build Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/658 From thartmann at openjdk.java.net Wed Oct 14 14:59:12 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 14 Oct 2020 14:59:12 GMT Subject: RFR: 8254769: Remove unimplemented BCEscapeAnalyzer::{add_dependence, propagate_dependencies} In-Reply-To: References: Message-ID: <-BhdCIty3ONUHbejiQ2R5ftUJ1YuLlCRsf2wDHJLRDU=.9eb77aba-12b9-494f-8c0b-e7f7fddf1404@github.com> On Wed, 14 Oct 2020 14:30:38 GMT, Aleksey Shipilev wrote: > These methods are not implemented, and there were no definitions since the initial load. Can be removed. > > Testing: > - [x] Linux x86_64 build > - [x] Text searches for `add_dependence` and `propagate_dependencies` in `src/hotspot` Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/656 From rriggs at openjdk.java.net Wed Oct 14 15:05:28 2020 From: rriggs at openjdk.java.net (Roger Riggs) Date: Wed, 14 Oct 2020 15:05:28 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v6] In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: On Wed, 14 Oct 2020 13:18:19 GMT, Roger Riggs wrote: >> Marked as reviewed by neliasso (Reviewer). > > Due to the requirement for multiple reviewers, I had been waiting to add my review of the Core-Libs files until the > HotSpot reviewers had approved! I see only one reviewer credited in the commit. This integration without testing with a current merge from the master and has caused two build failures. JDK-8254761: Wrong intrinsic annotation used for StringLatin1.indexOfChar Also, there is a raw unicode character in the JMH test that causes a compilation error. == Output from failing command(s) repeated here === [2020-10-14T14:39:09,608Z] * For target support_test_micro_classes__the.BUILD_JDK_MICROBENCHMARK_batch: [2020-10-14T14:39:09,611Z] /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: error: unmappable character (0xE2) for encoding ascii [2020-10-14T14:39:09,611Z] sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: error: unmappable character (0x98) for encoding ascii [2020-10-14T14:39:09,611Z] sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: error: unmappable character (0xBA) for encoding ascii [2020-10-14T14:39:09,611Z] sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: error: unclosed character literal [2020-10-14T14:39:09,611Z] sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: error: unclosed character literal [2020-10-14T14:39:09,611Z] sb.append(isUtf16?'???':'b');``` ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From psandoz at openjdk.java.net Wed Oct 14 15:16:48 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 14 Oct 2020 15:16:48 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) [v5] In-Reply-To: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> Message-ID: > This pull request is for integration of the Vector API. It was previously reviewed under conditions when mercurial was > used for the source code control system. Review threads can be found here (searching for issue number 8223347 in the > title): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-April/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/thread.html > > If mercurial was still being used the code would be pushed directly, once the CSR is approved. However, in this case a > pull request is required and needs explicit reviewer approval. Between the final review and this pull request no code > has changed, except for that related to merging. Paul Sandoz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Merge master - Merge master - Fix related to merge - HotspotIntrinsicCandidate to IntrinsicCandidate - Merge master - Fix permissions - Fix permissions - Merge master - Vector API new files - Integration of Vector API (Incubator) ------------- Changes: https://git.openjdk.java.net/jdk/pull/367/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=367&range=04 Stats: 295107 lines in 336 files changed: 292957 ins; 1062 del; 1088 mod Patch: https://git.openjdk.java.net/jdk/pull/367.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/367/head:pull/367 PR: https://git.openjdk.java.net/jdk/pull/367 From xliu at openjdk.java.net Wed Oct 14 16:19:16 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 14 Oct 2020 16:19:16 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences Message-ID: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> 8254369: Node::disconnect_inputs may skip precedences ------------- Commit messages: - 8254369: Node::disconnect_inputs may skip precedences Changes: https://git.openjdk.java.net/jdk/pull/664/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=664&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254369 Stats: 18 lines in 2 files changed: 15 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/664.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/664/head:pull/664 PR: https://git.openjdk.java.net/jdk/pull/664 From xliu at openjdk.java.net Wed Oct 14 16:19:16 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 14 Oct 2020 16:19:16 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences In-Reply-To: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> Message-ID: On Wed, 14 Oct 2020 16:04:32 GMT, Xin Liu wrote: > 8254369: Node::disconnect_inputs may skip precedences disconnect_inputs() needs to iterate precedences edges in reverse order because rm_prec(i) may backfill _in[i] with a value afterward. also remove the predicate if (n != NULL) in set_prec because it's always true. ------------- PR: https://git.openjdk.java.net/jdk/pull/664 From psandoz at openjdk.java.net Wed Oct 14 17:10:43 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 14 Oct 2020 17:10:43 GMT Subject: RFR: 8223347: Integration of Vector API (Incubator) [v6] In-Reply-To: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> Message-ID: > This pull request is for integration of the Vector API. It was previously reviewed under conditions when mercurial was > used for the source code control system. Review threads can be found here (searching for issue number 8223347 in the > title): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-April/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/thread.html > > If mercurial was still being used the code would be pushed directly, once the CSR is approved. However, in this case a > pull request is required and needs explicit reviewer approval. Between the final review and this pull request no code > has changed, except for that related to merging. Paul Sandoz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge master - Merge master - Merge master - Fix related to merge - HotspotIntrinsicCandidate to IntrinsicCandidate - Merge master - Fix permissions - Fix permissions - Merge master - Vector API new files - ... and 1 more: https://git.openjdk.java.net/jdk/compare/96a1f08e...3346d292 ------------- Changes: https://git.openjdk.java.net/jdk/pull/367/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=367&range=05 Stats: 295107 lines in 336 files changed: 292957 ins; 1062 del; 1088 mod Patch: https://git.openjdk.java.net/jdk/pull/367.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/367/head:pull/367 PR: https://git.openjdk.java.net/jdk/pull/367 From rriggs at openjdk.java.net Wed Oct 14 18:02:21 2020 From: rriggs at openjdk.java.net (Roger Riggs) Date: Wed, 14 Oct 2020 18:02:21 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v6] In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: On Wed, 14 Oct 2020 15:01:42 GMT, Roger Riggs wrote: >> Due to the requirement for multiple reviewers, I had been waiting to add my review of the Core-Libs files until the >> HotSpot reviewers had approved! I see only one reviewer credited in the commit. > > This integration without testing with a current merge from the master and has caused two build failures. > > JDK-8254761: Wrong intrinsic annotation used for StringLatin1.indexOfChar > > JDK-8254775: Microbenchmark StringIndexOfChar doesn't compile > > There is a raw unicode character in the JMH test that causes a compilation error. > == Output from failing command(s) repeated here === > [2020-10-14T14:39:09,608Z] * For target support_test_micro_classes__the.BUILD_JDK_MICROBENCHMARK_batch: > [2020-10-14T14:39:09,611Z] > /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: > error: unmappable character (0xE2) for encoding ascii [2020-10-14T14:39:09,611Z] > sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] > /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: > error: unmappable character (0x98) for encoding ascii [2020-10-14T14:39:09,611Z] > sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] > /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: > error: unmappable character (0xBA) for encoding ascii [2020-10-14T14:39:09,611Z] > sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] > /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: > error: unclosed character literal [2020-10-14T14:39:09,611Z] sb.append(isUtf16?'???':'b'); > [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] > /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: > error: unclosed character literal [2020-10-14T14:39:09,611Z] sb.append(isUtf16?'???':'b');``` And also a failed Graal test because of the new intrinsic. And JDK-8254785: compiler/graalunit/HotspotTest.java failed with "missing Graal intrinsics for: java/lang/StringLatin1.indexOfChar([BIII)I" @phohensee don't be so quick to type `/sponsor`; there are three separate build and test failures. ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From dcubed at openjdk.java.net Wed Oct 14 18:49:20 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 14 Oct 2020 18:49:20 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java Message-ID: This is a trivial change to ProblemList compiler/graalunit/HotspotTest.java. ------------- Commit messages: - 8254789: ProblemList compiler/graalunit/HotspotTest.java Changes: https://git.openjdk.java.net/jdk/pull/666/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=666&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254789 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/666.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/666/head:pull/666 PR: https://git.openjdk.java.net/jdk/pull/666 From rriggs at openjdk.java.net Wed Oct 14 18:49:20 2020 From: rriggs at openjdk.java.net (Roger Riggs) Date: Wed, 14 Oct 2020 18:49:20 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java In-Reply-To: References: Message-ID: <2atPIagsa8VCzmwgTjvYylCsDdhWOIAdoNAgel1aXZ4=.bab148e5-4d98-4b76-96fe-05c3d4023999@github.com> On Wed, 14 Oct 2020 18:41:10 GMT, Daniel D. Daugherty wrote: > This is a trivial change to ProblemList compiler/graalunit/HotspotTest.java. Marked as reviewed by rriggs (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/666 From dcubed at openjdk.java.net Wed Oct 14 18:49:20 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 14 Oct 2020 18:49:20 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java In-Reply-To: References: <2atPIagsa8VCzmwgTjvYylCsDdhWOIAdoNAgel1aXZ4=.bab148e5-4d98-4b76-96fe-05c3d4023999@github.com> Message-ID: <4bqa4Y8QHwDUJ1TBe2P-uzTeA1M6Be7_bI_ZvmGR9Co=.3b99e49d-ca16-4514-a5db-782cfb1c478f@github.com> On Wed, 14 Oct 2020 18:43:44 GMT, Igor Ignatyev wrote: >> Marked as reviewed by rriggs (Reviewer). > > @dcubed-ojdk > graal unit tests can be excluded individually by adding entries to `ProblemList-graal.txt` : > diff --git a/test/hotspot/jtreg/ProblemList-graal.txt b/test/hotspot/jtreg/ProblemList-graal.txt > index f73a4883f42..634f2bc12f6 100644 > --- a/test/hotspot/jtreg/ProblemList-graal.txt > +++ b/test/hotspot/jtreg/ProblemList-graal.txt > @@ -239,3 +239,5 @@ org.graalvm.compiler.replacements.test.classfile.ClassfileBytecodeProviderTest > org.graalvm.compiler.core.test.deopt.CompiledMethodTest 8202955 > > org.graalvm.compiler.hotspot.test.ReservedStackAccessTest 8213567 windows-all > + > +org.graalvm.compiler.hotspot.test.CheckGraalIntrinsics 8254785 @iignatev - I forgot about that file! ------------- PR: https://git.openjdk.java.net/jdk/pull/666 From iignatyev at openjdk.java.net Wed Oct 14 18:49:20 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 14 Oct 2020 18:49:20 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java In-Reply-To: <2atPIagsa8VCzmwgTjvYylCsDdhWOIAdoNAgel1aXZ4=.bab148e5-4d98-4b76-96fe-05c3d4023999@github.com> References: <2atPIagsa8VCzmwgTjvYylCsDdhWOIAdoNAgel1aXZ4=.bab148e5-4d98-4b76-96fe-05c3d4023999@github.com> Message-ID: On Wed, 14 Oct 2020 18:43:19 GMT, Roger Riggs wrote: >> This is a trivial change to ProblemList compiler/graalunit/HotspotTest.java. > > Marked as reviewed by rriggs (Reviewer). @dcubed-ojdk graal unit tests can be excluded individually by adding entries to `ProblemList-graal.txt` : diff --git a/test/hotspot/jtreg/ProblemList-graal.txt b/test/hotspot/jtreg/ProblemList-graal.txt index f73a4883f42..634f2bc12f6 100644 --- a/test/hotspot/jtreg/ProblemList-graal.txt +++ b/test/hotspot/jtreg/ProblemList-graal.txt @@ -239,3 +239,5 @@ org.graalvm.compiler.replacements.test.classfile.ClassfileBytecodeProviderTest org.graalvm.compiler.core.test.deopt.CompiledMethodTest 8202955 org.graalvm.compiler.hotspot.test.ReservedStackAccessTest 8213567 windows-all + +org.graalvm.compiler.hotspot.test.CheckGraalIntrinsics 8254785 ------------- PR: https://git.openjdk.java.net/jdk/pull/666 From dcubed at openjdk.java.net Wed Oct 14 18:54:21 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 14 Oct 2020 18:54:21 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java [v2] In-Reply-To: References: Message-ID: > This is a trivial change to ProblemList compiler/graalunit/HotspotTest.java. Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: Move ProblemListing of compiler/graalunit/HotspotTest.java from test/hotspot/jtreg/ProblemList.txt to the specific subtest in test/hotspot/jtreg/ProblemList-graal.txt. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/666/files - new: https://git.openjdk.java.net/jdk/pull/666/files/8b01c7c2..4cafa5b7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=666&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=666&range=00-01 Stats: 5 lines in 2 files changed: 2 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/666.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/666/head:pull/666 PR: https://git.openjdk.java.net/jdk/pull/666 From iignatyev at openjdk.java.net Wed Oct 14 19:11:25 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 14 Oct 2020 19:11:25 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java [v2] In-Reply-To: References: Message-ID: <-Hw9lUUB12UzjLI1ZosOdqpexCJpI-KQBVG2tHS_eOM=.88e8433d-bf47-43fc-a3f3-68bf8466e8c5@github.com> On Wed, 14 Oct 2020 18:54:21 GMT, Daniel D. Daugherty wrote: >> This is a trivial change to ProblemList compiler/graalunit/HotspotTest.java. > > Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: > > Move ProblemListing of compiler/graalunit/HotspotTest.java from test/hotspot/jtreg/ProblemList.txt to the specific > subtest in test/hotspot/jtreg/ProblemList-graal.txt. LGTM. but, in JBS, @vnkozlov said that it's easy to fix the test, and he is working on the fix, so depending on the timing we might want to skip this one. ------------- Marked as reviewed by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/666 From dcubed at openjdk.java.net Wed Oct 14 19:11:25 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 14 Oct 2020 19:11:25 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java [v3] In-Reply-To: References: Message-ID: > This is a trivial change to ProblemList compiler/graalunit/HotspotTest.java. Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: Add missing platform specification. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/666/files - new: https://git.openjdk.java.net/jdk/pull/666/files/4cafa5b7..869576d2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=666&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=666&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/666.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/666/head:pull/666 PR: https://git.openjdk.java.net/jdk/pull/666 From kvn at openjdk.java.net Wed Oct 14 19:11:26 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 14 Oct 2020 19:11:26 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java [v3] In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 19:08:18 GMT, Daniel D. Daugherty wrote: >> This is a trivial change to ProblemList compiler/graalunit/HotspotTest.java. > > Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: > > Add missing platform specification. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/666 From dcubed at openjdk.java.net Wed Oct 14 19:11:26 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 14 Oct 2020 19:11:26 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java [v3] In-Reply-To: <4bqa4Y8QHwDUJ1TBe2P-uzTeA1M6Be7_bI_ZvmGR9Co=.3b99e49d-ca16-4514-a5db-782cfb1c478f@github.com> References: <2atPIagsa8VCzmwgTjvYylCsDdhWOIAdoNAgel1aXZ4=.bab148e5-4d98-4b76-96fe-05c3d4023999@github.com> <4bqa4Y8QHwDUJ1TBe2P-uzTeA1M6Be7_bI_ZvmGR9Co=.3b99e49d-ca16-4514-a5db-782cfb1c478f@github.com> Message-ID: <65VId5w16rWeIAhkQlO1s6FZK8Qzqqb-1_hizOrCfqU=.95f2e016-32ee-45a4-a836-348069e609a7@github.com> On Wed, 14 Oct 2020 18:44:51 GMT, Daniel D. Daugherty wrote: >> @dcubed-ojdk >> graal unit tests can be excluded individually by adding entries to `ProblemList-graal.txt` : >> diff --git a/test/hotspot/jtreg/ProblemList-graal.txt b/test/hotspot/jtreg/ProblemList-graal.txt >> index f73a4883f42..634f2bc12f6 100644 >> --- a/test/hotspot/jtreg/ProblemList-graal.txt >> +++ b/test/hotspot/jtreg/ProblemList-graal.txt >> @@ -239,3 +239,5 @@ org.graalvm.compiler.replacements.test.classfile.ClassfileBytecodeProviderTest >> org.graalvm.compiler.core.test.deopt.CompiledMethodTest 8202955 >> >> org.graalvm.compiler.hotspot.test.ReservedStackAccessTest 8213567 windows-all >> + >> +org.graalvm.compiler.hotspot.test.CheckGraalIntrinsics 8254785 > > @iignatev - I forgot about that file! @iignatev - should be fixed now. @RogerRiggs - can you also re-review? ------------- PR: https://git.openjdk.java.net/jdk/pull/666 From iignatyev at openjdk.java.net Wed Oct 14 19:11:27 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 14 Oct 2020 19:11:27 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java [v2] In-Reply-To: References: <-Hw9lUUB12UzjLI1ZosOdqpexCJpI-KQBVG2tHS_eOM=.88e8433d-bf47-43fc-a3f3-68bf8466e8c5@github.com> Message-ID: On Wed, 14 Oct 2020 18:58:46 GMT, Daniel D. Daugherty wrote: > Hold on. I somehow lost the platform part of the entry. you don't need it, and IIRC, it's not actually read by `compiler.graalunit.common.GraalUnitTestLauncher` ------------- PR: https://git.openjdk.java.net/jdk/pull/666 From dcubed at openjdk.java.net Wed Oct 14 19:11:27 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 14 Oct 2020 19:11:27 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java [v2] In-Reply-To: <-Hw9lUUB12UzjLI1ZosOdqpexCJpI-KQBVG2tHS_eOM=.88e8433d-bf47-43fc-a3f3-68bf8466e8c5@github.com> References: <-Hw9lUUB12UzjLI1ZosOdqpexCJpI-KQBVG2tHS_eOM=.88e8433d-bf47-43fc-a3f3-68bf8466e8c5@github.com> Message-ID: On Wed, 14 Oct 2020 18:58:42 GMT, Igor Ignatyev wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> Move ProblemListing of compiler/graalunit/HotspotTest.java from test/hotspot/jtreg/ProblemList.txt to the specific >> subtest in test/hotspot/jtreg/ProblemList-graal.txt. > > LGTM. but, in JBS, @vnkozlov said that it's easy to fix the test, and he is working on the fix, so depending on the > timing we might want to skip this one. Hold on. I somehow lost the platform part of the entry. ------------- PR: https://git.openjdk.java.net/jdk/pull/666 From dcubed at openjdk.java.net Wed Oct 14 19:11:27 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 14 Oct 2020 19:11:27 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java [v2] In-Reply-To: References: <-Hw9lUUB12UzjLI1ZosOdqpexCJpI-KQBVG2tHS_eOM=.88e8433d-bf47-43fc-a3f3-68bf8466e8c5@github.com> Message-ID: On Wed, 14 Oct 2020 19:00:49 GMT, Daniel D. Daugherty wrote: >>> Hold on. I somehow lost the platform part of the entry. >> >> you don't need it, and IIRC, it's not actually read by `compiler.graalunit.common.GraalUnitTestLauncher` > > Okay. I added it anyway since the previous entry had it... I want to integrate this ProblemListing to get the noise down. ------------- PR: https://git.openjdk.java.net/jdk/pull/666 From dcubed at openjdk.java.net Wed Oct 14 19:11:27 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 14 Oct 2020 19:11:27 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java [v2] In-Reply-To: References: <-Hw9lUUB12UzjLI1ZosOdqpexCJpI-KQBVG2tHS_eOM=.88e8433d-bf47-43fc-a3f3-68bf8466e8c5@github.com> Message-ID: On Wed, 14 Oct 2020 18:59:51 GMT, Igor Ignatyev wrote: >> Hold on. I somehow lost the platform part of the entry. > >> Hold on. I somehow lost the platform part of the entry. > > you don't need it, and IIRC, it's not actually read by `compiler.graalunit.common.GraalUnitTestLauncher` Okay. I added it anyway since the previous entry had it... ------------- PR: https://git.openjdk.java.net/jdk/pull/666 From kvn at openjdk.java.net Wed Oct 14 19:11:28 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 14 Oct 2020 19:11:28 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java [v2] In-Reply-To: <-Hw9lUUB12UzjLI1ZosOdqpexCJpI-KQBVG2tHS_eOM=.88e8433d-bf47-43fc-a3f3-68bf8466e8c5@github.com> References: <-Hw9lUUB12UzjLI1ZosOdqpexCJpI-KQBVG2tHS_eOM=.88e8433d-bf47-43fc-a3f3-68bf8466e8c5@github.com> Message-ID: <2EmgCT55Xu8GSspN2sI9Lwcw7toAjgWrAn2uqKV7DFQ=.57a13fa1-8ad4-4409-bd8b-b9854e6bef9e@github.com> On Wed, 14 Oct 2020 18:58:42 GMT, Igor Ignatyev wrote: > LGTM. but, in JBS, @vnkozlov said that it's easy to fix the test, and he is working on the fix, so depending on the > timing we might want to skip this one. I am fine with problem list the test first. Especially if you can list only CheckGraalIntrinsics. I am also inclining to do it permanently for time in JDK. I am tired to fix it each time people push new intrinsic. It gives us (JPG) nothing. ------------- PR: https://git.openjdk.java.net/jdk/pull/666 From dcubed at openjdk.java.net Wed Oct 14 19:11:28 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 14 Oct 2020 19:11:28 GMT Subject: RFR: 8254789: ProblemList compiler/graalunit/HotspotTest.java [v3] In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 19:03:33 GMT, Vladimir Kozlov wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing platform specification. > > Marked as reviewed by kvn (Reviewer). @RogerRiggs, @iignatev and @vnkozlov - Thanks for the fast reviews and the help with getting this right. ------------- PR: https://git.openjdk.java.net/jdk/pull/666 From dcubed at openjdk.java.net Wed Oct 14 19:11:29 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 14 Oct 2020 19:11:29 GMT Subject: Integrated: 8254789: ProblemList compiler/graalunit/HotspotTest.java In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 18:41:10 GMT, Daniel D. Daugherty wrote: > This is a trivial change to ProblemList compiler/graalunit/HotspotTest.java. This pull request has now been integrated. Changeset: 386e7e8b Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/386e7e8b Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8254789: ProblemList compiler/graalunit/HotspotTest.java Reviewed-by: rriggs, iignatyev, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/666 From psandoz at openjdk.java.net Wed Oct 14 20:06:30 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 14 Oct 2020 20:06:30 GMT Subject: Integrated: 8223347: Integration of Vector API (Incubator) In-Reply-To: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> References: <-PE4TwXgvq2bemAn_8csjn4_j7zoAolnQz6QQt3z0Wk=.eaa9999f-0713-4349-b31d-934717aa37a1@github.com> Message-ID: On Fri, 25 Sep 2020 20:14:29 GMT, Paul Sandoz wrote: > This pull request is for integration of the Vector API. It was previously reviewed under conditions when mercurial was > used for the source code control system. Review threads can be found here (searching for issue number 8223347 in the > title): https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-April/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/thread.html > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-July/thread.html > > If mercurial was still being used the code would be pushed directly, once the CSR is approved. However, in this case a > pull request is required and needs explicit reviewer approval. Between the final review and this pull request no code > has changed, except for that related to merging. This pull request has now been integrated. Changeset: 0c99b192 Author: Paul Sandoz URL: https://git.openjdk.java.net/jdk/commit/0c99b192 Stats: 295107 lines in 336 files changed: 292957 ins; 1062 del; 1088 mod 8223347: Integration of Vector API (Incubator) Co-authored-by: Vivek Deshpande Co-authored-by: Qi Feng Co-authored-by: Ian Graves Co-authored-by: Jean-Philippe Halimi Co-authored-by: Vladimir Ivanov Co-authored-by: Ningsheng Jian Co-authored-by: Razvan Lupusoru Co-authored-by: Smita Kamath Co-authored-by: Rahul Kandu Co-authored-by: Kishor Kharbas Co-authored-by: Eric Liu Co-authored-by: Aaloan Miftah Co-authored-by: John R Rose Co-authored-by: Shravya Rukmannagari Co-authored-by: Paul Sandoz Co-authored-by: Sandhya Viswanathan Co-authored-by: Lauren Walkowski Co-authored-by: Yang Zang Co-authored-by: Joshua Zhu Co-authored-by: Wang Zhuo Co-authored-by: Jatin Bhateja Reviewed-by: erikj, chegar, kvn, darcy, forax, briangoetz, aph, epavlova, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/367 From dcubed at openjdk.java.net Wed Oct 14 20:55:17 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 14 Oct 2020 20:55:17 GMT Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v6] In-Reply-To: References: <_T0873dC5tfUtGn9r1_Y21JkPKRt-za3MM9hPN2GQKQ=.b865fe53-5417-424f-81b6-1566a330640e@github.com> Message-ID: On Wed, 14 Oct 2020 17:59:53 GMT, Roger Riggs wrote: >> This integration without testing with a current merge from the master and has caused two build failures. >> >> JDK-8254761: Wrong intrinsic annotation used for StringLatin1.indexOfChar >> >> JDK-8254775: Microbenchmark StringIndexOfChar doesn't compile >> >> There is a raw unicode character in the JMH test that causes a compilation error. >> == Output from failing command(s) repeated here === >> [2020-10-14T14:39:09,608Z] * For target support_test_micro_classes__the.BUILD_JDK_MICROBENCHMARK_batch: >> [2020-10-14T14:39:09,611Z] >> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: >> error: unmappable character (0xE2) for encoding ascii [2020-10-14T14:39:09,611Z] >> sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] >> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: >> error: unmappable character (0x98) for encoding ascii [2020-10-14T14:39:09,611Z] >> sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] >> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: >> error: unmappable character (0xBA) for encoding ascii [2020-10-14T14:39:09,611Z] >> sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] >> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: >> error: unclosed character literal [2020-10-14T14:39:09,611Z] sb.append(isUtf16?'???':'b'); >> [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] >> /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: >> error: unclosed character literal [2020-10-14T14:39:09,611Z] sb.append(isUtf16?'???':'b');``` > > And also a failed Graal test because of the new intrinsic. > > And JDK-8254785: compiler/graalunit/HotspotTest.java failed with "missing Graal intrinsics for: > java/lang/StringLatin1.indexOfChar([BIII)I" > @phohensee don't be so quick to type `/sponsor`; there are three separate build and test failures. @phohensee - @vnkozlov has determined that a new Tier2 test failure is also caused by this fix. See https://bugs.openjdk.java.net/browse/JDK-8254790. ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From hohensee at amazon.com Wed Oct 14 21:28:09 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Wed, 14 Oct 2020 21:28:09 +0000 Subject: RFR: 8173585: Intrinsify StringLatin1.indexOf(char) [v6] Message-ID: <838553DC-4961-4681-990E-4254D3316538@amazon.com> My apologies. I relied on the other reviewers. I'll do an independent review in the future. Thanks, Paul ?On 10/14/20, 11:02 AM, "core-libs-dev on behalf of Roger Riggs" wrote: On Wed, 14 Oct 2020 15:01:42 GMT, Roger Riggs wrote: >> Due to the requirement for multiple reviewers, I had been waiting to add my review of the Core-Libs files until the >> HotSpot reviewers had approved! I see only one reviewer credited in the commit. > > This integration without testing with a current merge from the master and has caused two build failures. > > JDK-8254761: Wrong intrinsic annotation used for StringLatin1.indexOfChar > > JDK-8254775: Microbenchmark StringIndexOfChar doesn't compile > > There is a raw unicode character in the JMH test that causes a compilation error. > == Output from failing command(s) repeated here === > [2020-10-14T14:39:09,608Z] * For target support_test_micro_classes__the.BUILD_JDK_MICROBENCHMARK_batch: > [2020-10-14T14:39:09,611Z] > /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: > error: unmappable character (0xE2) for encoding ascii [2020-10-14T14:39:09,611Z] > sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] > /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: > error: unmappable character (0x98) for encoding ascii [2020-10-14T14:39:09,611Z] > sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] > /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: > error: unmappable character (0xBA) for encoding ascii [2020-10-14T14:39:09,611Z] > sb.append(isUtf16?'???':'b'); [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] > /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: > error: unclosed character literal [2020-10-14T14:39:09,611Z] sb.append(isUtf16?'???':'b'); > [2020-10-14T14:39:09,611Z] ^ [2020-10-14T14:39:09,611Z] > /opt/mach5/mesos/work_dir/slaves/4076d11c-c6ed-4d07-84c1-4ab8d55cd975-S108796/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/400e1f56-2d49-42d8-8879-97d4fbb6c909/runs/c49da2bc-a8fe-4a5d-8159-57a9b0316fd2/workspace/open/test/micro/org/openjdk/bench/java/lang/StringIndexOfChar.java:71: > error: unclosed character literal [2020-10-14T14:39:09,611Z] sb.append(isUtf16?'???':'b');``` And also a failed Graal test because of the new intrinsic. And JDK-8254785: compiler/graalunit/HotspotTest.java failed with "missing Graal intrinsics for: java/lang/StringLatin1.indexOfChar([BIII)I" @phohensee don't be so quick to type `/sponsor`; there are three separate build and test failures. ------------- PR: https://git.openjdk.java.net/jdk/pull/71 From kvn at openjdk.java.net Wed Oct 14 22:27:17 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 14 Oct 2020 22:27:17 GMT Subject: RFR: 8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed Message-ID: Temporary disable new intrinsic StringLatin1.indexOf to keep testing clean while the fix for JDK-8254790 is prepared. Tested hs-tier1 and failed test. Currently running tier2 and tier3. ------------- Commit messages: - 8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed Changes: https://git.openjdk.java.net/jdk/pull/670/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=670&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254792 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/670.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/670/head:pull/670 PR: https://git.openjdk.java.net/jdk/pull/670 From kvn at openjdk.java.net Wed Oct 14 23:14:12 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 14 Oct 2020 23:14:12 GMT Subject: RFR: 8254734: "dead loop detected" assert failure with patch from 8223051 In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 07:52:58 GMT, Roland Westrelin wrote: > compiler/c2/TestDeadDataLoopIGVN.java ran with -server -Xcomp > -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 > -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation and > the patch from 8223051 (long counted loops, not integrated yet) fails > with: > > assert(no_dead_loop) failed: dead loop detected > > I can reproduce the failure with only this change from 8223051: > diff --git a/src/hotspot/share/opto/callnode.cpp b/src/hotspot/share/opto/callnode.cpp > index 268cec7732c..295cb4ecaf9 100644 > --- a/src/hotspot/share/opto/callnode.cpp > +++ b/src/hotspot/share/opto/callnode.cpp > @@ -1159,7 +1159,7 @@ Node* SafePointNode::Identity(PhaseGVN* phase) { > if( in(TypeFunc::Control)->is_SafePoint() ) > return in(TypeFunc::Control); > > - if( in(0)->is_Proj() ) { > + if (in(0)->is_Proj() && !phase->C->major_progress()) { > Node *n0 = in(0)->in(0); > // Check if he is a call projection (except Leaf Call) > if( n0->is_Catch() ) { > diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp > index 4b88c379dea..baf2bf9bacc 100644 > --- a/src/hotspot/share/opto/parse1.cpp > +++ b/src/hotspot/share/opto/parse1.cpp > @@ -2254,23 +2254,7 @@ void Parse::return_current(Node* value) { > > //------------------------------add_safepoint---------------------------------- > void Parse::add_safepoint() { > - // See if we can avoid this safepoint. No need for a SafePoint immediately > - // after a Call (except Leaf Call) or another SafePoint. > - Node *proj = control(); > uint parms = TypeFunc::Parms+1; > - if( proj->is_Proj() ) { > - Node *n0 = proj->in(0); > - if( n0->is_Catch() ) { > - n0 = n0->in(0)->in(0); > - assert( n0->is_Call(), "expect a call here" ); > - } > - if( n0->is_Call() ) { > - if( n0->as_Call()->guaranteed_safepoint() ) > - return; > - } else if( n0->is_SafePoint() && n0->req() >= parms ) { > - return; > - } > - } > > // Clear out dead values from the debug info. > kill_dead_locals(); > so it's unrelated to the long counted loops transformation itself. > > At IGVN time, a dead loop is optimized. The loop contains the > following subgraph: > > (LoadI .. (AddP (CastPP .. (Phi (Proj (CallStaticJava .. (AddI > > The projection is on the backedge of the phi. The AddI points back to > the LoadI. The CallStaticJava is a boxing method call. The LoadI is a > boxed value load. > > The phi is optimized out as the loop is unreachable: > > (LoadI .. (AddP (Proj (CallStaticJava .. (AddI > > The LoadI is then optimized by code MemNode::can_see_stored_value(): > // Load boxed value from result of valueOf() call is input parameter. > if (this->is_Load() && ld_adr->is_AddP() && > (tp != NULL) && tp->is_ptr_to_boxed_value()) { > intptr_t ignore = 0; > Node* base = AddPNode::Ideal_base_and_offset(ld_adr, phase, ignore); > BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2(); > base = bs->step_over_gc_barrier(base); > if (base != NULL && base->is_Proj() && > base->as_Proj()->_con == TypeFunc::Parms && > base->in(0)->is_CallStaticJava() && > base->in(0)->as_CallStaticJava()->is_boxing_method()) { > return base->in(0)->in(TypeFunc::Parms); > } > } > to the AddI. Because the AddI has the LoadI as input we end up with a > dead loop. I propose extending the dead loop safe logic to take the > pattern with a boxing method load into account. good ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/649 From kvn at openjdk.java.net Wed Oct 14 23:24:08 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 14 Oct 2020 23:24:08 GMT Subject: RFR: 8254769: Remove unimplemented BCEscapeAnalyzer::{add_dependence, propagate_dependencies} In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 14:30:38 GMT, Aleksey Shipilev wrote: > These methods are not implemented, and there were no definitions since the initial load. Can be removed. > > Testing: > - [x] Linux x86_64 build > - [x] Text searches for `add_dependence` and `propagate_dependencies` in `src/hotspot` Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/656 From kvn at openjdk.java.net Wed Oct 14 23:25:14 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 14 Oct 2020 23:25:14 GMT Subject: RFR: 8254773: Remove unimplemented ciReplay::is_loaded(Klass* klass) In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 14:44:44 GMT, Aleksey Shipilev wrote: > Both overloads for Klass* and Method* were added by JDK-6830717. But Klass* overload was never implemented: > http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/bd7a7ce2e264#l32.934 > > Testing: > - [x] Linux x86_64 build Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/658 From kvn at openjdk.java.net Wed Oct 14 23:26:14 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 14 Oct 2020 23:26:14 GMT Subject: RFR: 8254771: Remove unimplemented ciSignature::get_all_klasses In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 14:36:59 GMT, Aleksey Shipilev wrote: > This method is not implemented, and there was no definition since the initial load. Can be removed. > > Testing: > - [x] Linux x86_64 build > - [x] Text search for `get_all_klasses` in `src/hotspot`; there is `JdkJfrEvent::get_all_klasses` that looks unrelated. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/657 From kvn at openjdk.java.net Wed Oct 14 23:35:10 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 14 Oct 2020 23:35:10 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences In-Reply-To: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> Message-ID: <4tKcqdZXXAmHA6zzDeFa4Mvei7x6rYJq2HUU2xVlD2Y=.cd967efa-009e-4dae-822d-b3554362356b@github.com> On Wed, 14 Oct 2020 16:04:32 GMT, Xin Liu wrote: > 8254369: Node::disconnect_inputs may skip precedences Changes requested by kvn (Reviewer). src/hotspot/share/opto/node.cpp line 918: > 916: } > 917: > 918: #ifndef PRODUCT This should be #ifdef ASSERT src/hotspot/share/opto/node.cpp line 914: > 912: // Remove precedence edges if any exist > 913: // Note: Safepoints may have precedence edges, even during parsing > 914: for (uint i = len() - 1; i < len() && i >= req(); --i) { i < len() check is not needed ------------- PR: https://git.openjdk.java.net/jdk/pull/664 From kvn at openjdk.java.net Wed Oct 14 23:40:09 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 14 Oct 2020 23:40:09 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences In-Reply-To: <4tKcqdZXXAmHA6zzDeFa4Mvei7x6rYJq2HUU2xVlD2Y=.cd967efa-009e-4dae-822d-b3554362356b@github.com> References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> <4tKcqdZXXAmHA6zzDeFa4Mvei7x6rYJq2HUU2xVlD2Y=.cd967efa-009e-4dae-822d-b3554362356b@github.com> Message-ID: On Wed, 14 Oct 2020 23:32:31 GMT, Vladimir Kozlov wrote: >> 8254369: Node::disconnect_inputs may skip precedences > > src/hotspot/share/opto/node.cpp line 914: > >> 912: // Remove precedence edges if any exist >> 913: // Note: Safepoints may have precedence edges, even during parsing >> 914: for (uint i = len() - 1; i < len() && i >= req(); --i) { > > i < len() check is not needed I got now why you need to scan reverse. ------------- PR: https://git.openjdk.java.net/jdk/pull/664 From dcubed at openjdk.java.net Thu Oct 15 00:04:11 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 15 Oct 2020 00:04:11 GMT Subject: RFR: 8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 22:20:21 GMT, Vladimir Kozlov wrote: > Temporary disable new intrinsic StringLatin1.indexOf to keep testing clean while the fix for JDK-8254790 is prepared. > > Tested hs-tier1 and failed test. > Currently running tier2 and tier3. Thumbs up! ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/670 From kvn at openjdk.java.net Thu Oct 15 00:11:09 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 15 Oct 2020 00:11:09 GMT Subject: RFR: 8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed In-Reply-To: References: Message-ID: On Thu, 15 Oct 2020 00:01:28 GMT, Daniel D. Daugherty wrote: >> Temporary disable new intrinsic StringLatin1.indexOf to keep testing clean while the fix for JDK-8254790 is prepared. >> >> Tested hs-tier1 and failed test. >> Currently running tier2 and tier3. > > Thumbs up! Tier2 testing passed without failures. ------------- PR: https://git.openjdk.java.net/jdk/pull/670 From kvn at openjdk.java.net Thu Oct 15 00:11:10 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 15 Oct 2020 00:11:10 GMT Subject: Integrated: 8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed In-Reply-To: References: Message-ID: <5Yy4eF7B8qFOpPf1d_ORy-hVNg_xFqe2DxtsJWOHZz8=.dbd4560e-61f8-4f02-8aca-4b02b6dade31@github.com> On Wed, 14 Oct 2020 22:20:21 GMT, Vladimir Kozlov wrote: > Temporary disable new intrinsic StringLatin1.indexOf to keep testing clean while the fix for JDK-8254790 is prepared. > > Tested hs-tier1 and failed test. > Currently running tier2 and tier3. This pull request has now been integrated. Changeset: 5194f11b Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/5194f11b Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed Reviewed-by: dcubed ------------- PR: https://git.openjdk.java.net/jdk/pull/670 From xliu at openjdk.java.net Thu Oct 15 00:23:12 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 15 Oct 2020 00:23:12 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences In-Reply-To: References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> <4tKcqdZXXAmHA6zzDeFa4Mvei7x6rYJq2HUU2xVlD2Y=.cd967efa-009e-4dae-822d-b3554362356b@github.com> Message-ID: <6KDyUMTH9NnDNZzaAFA_9jtmsVW47OPeNqfucRtq2_U=.b7c283c1-2f0e-48b8-a748-3ebd9475fdea@github.com> On Wed, 14 Oct 2020 23:37:34 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/opto/node.cpp line 914: >> >>> 912: // Remove precedence edges if any exist >>> 913: // Note: Safepoints may have precedence edges, even during parsing >>> 914: for (uint i = len() - 1; i < len() && i >= req(); --i) { >> >> i < len() check is not needed > > I got now why you need to scan reverse. hi, @vnkozlov , Thank you to review it. i References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> <4tKcqdZXXAmHA6zzDeFa4Mvei7x6rYJq2HUU2xVlD2Y=.cd967efa-009e-4dae-822d-b3554362356b@github.com> Message-ID: On Wed, 14 Oct 2020 23:25:24 GMT, Vladimir Kozlov wrote: >> 8254369: Node::disconnect_inputs may skip precedences > > src/hotspot/share/opto/node.cpp line 918: > >> 916: } >> 917: >> 918: #ifndef PRODUCT > > This should be #ifdef ASSERT got it. I will update it in next revision. ------------- PR: https://git.openjdk.java.net/jdk/pull/664 From jptatton at amazon.com Thu Oct 15 00:42:35 2020 From: jptatton at amazon.com (Tatton, Jason) Date: Thu, 15 Oct 2020 00:42:35 +0000 Subject: Howto replicate failure of 8254790? Message-ID: <4bd7e9f73ea24ae09f1bb0f1808ce5a7@EX13D46EUB003.ant.amazon.com> Hi all, I am trying to replicate the failure of the tier2 test mentioned in 8254790 but I am only seeing it pass under an x86 linux machine. Are there any specific architectural constraints under which this test should be run in order to make it fail? I am running the test via: make test TEST="test/jdk/javax/xml/crypto/dsig/GenerationTests.java" Note that I am running the test against master without the commit: "8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed" which disables the intrinsic that is causing the test to fail. Thanks -- Jason From david.holmes at oracle.com Thu Oct 15 00:48:04 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 15 Oct 2020 10:48:04 +1000 Subject: Howto replicate failure of 8254790? In-Reply-To: <4bd7e9f73ea24ae09f1bb0f1808ce5a7@EX13D46EUB003.ant.amazon.com> References: <4bd7e9f73ea24ae09f1bb0f1808ce5a7@EX13D46EUB003.ant.amazon.com> Message-ID: <661485ab-7de7-26cb-b2b1-3a4f643125eb@oracle.com> Hi Jason, On 15/10/2020 10:42 am, Tatton, Jason wrote: > Hi all, > > > > I am trying to replicate the failure of the tier2 test mentioned in 8254790 but I am only seeing it pass under an x86 linux machine. Are there any specific architectural constraints under which this test should be run in order to make it fail? It failed on a Mac, not Linux. Cheers, David > > > I am running the test via: make test TEST="test/jdk/javax/xml/crypto/dsig/GenerationTests.java" > > > > Note that I am running the test against master without the commit: "8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed" which disables the intrinsic that is causing the test to fail. > > > > Thanks > -- > Jason > From shade at openjdk.java.net Thu Oct 15 06:33:18 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 15 Oct 2020 06:33:18 GMT Subject: Integrated: 8254769: Remove unimplemented BCEscapeAnalyzer::{add_dependence, propagate_dependencies} In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 14:30:38 GMT, Aleksey Shipilev wrote: > These methods are not implemented, and there were no definitions since the initial load. Can be removed. > > Testing: > - [x] Linux x86_64 build > - [x] Text searches for `add_dependence` and `propagate_dependencies` in `src/hotspot` This pull request has now been integrated. Changeset: 81a8ff1d Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/81a8ff1d Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8254769: Remove unimplemented BCEscapeAnalyzer::{add_dependence, propagate_dependencies} Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/656 From jiefu at openjdk.java.net Thu Oct 15 06:33:25 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 15 Oct 2020 06:33:25 GMT Subject: RFR: 8254814: [Vector API] Fix an AVX512 crash after JDK-8223347 Message-ID: <9yFclJuYNDD8ZUHHTI5OazZPR4sMcvzScmXgXxyIqTU=.e46792cb-bdb9-4a24-9f52-b5952f7df23d@github.com> As discussed here[1], it's time to integrate this fix. And the original PR is here[2]. Testing: - All jdk/incubator/vector/ tests passed on Linux/x64 AVX512 machines [1] https://github.com/openjdk/jdk/pull/367#issuecomment-701560441 [2] https://github.com/openjdk/panama-vector/pull/1 ------------- Commit messages: - 8254814: [Vector API] Fix an AVX512 crash after JDK-8223347 Changes: https://git.openjdk.java.net/jdk/pull/676/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=676&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254814 Stats: 22 lines in 1 file changed: 0 ins; 20 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/676.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/676/head:pull/676 PR: https://git.openjdk.java.net/jdk/pull/676 From shade at openjdk.java.net Thu Oct 15 06:35:11 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 15 Oct 2020 06:35:11 GMT Subject: Integrated: 8254771: Remove unimplemented ciSignature::get_all_klasses In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 14:36:59 GMT, Aleksey Shipilev wrote: > This method is not implemented, and there was no definition since the initial load. Can be removed. > > Testing: > - [x] Linux x86_64 build > - [x] Text search for `get_all_klasses` in `src/hotspot`; there is `JdkJfrEvent::get_all_klasses` that looks unrelated. This pull request has now been integrated. Changeset: 167c1924 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/167c1924 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8254771: Remove unimplemented ciSignature::get_all_klasses Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/657 From shade at openjdk.java.net Thu Oct 15 06:36:11 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 15 Oct 2020 06:36:11 GMT Subject: Integrated: 8254773: Remove unimplemented ciReplay::is_loaded(Klass* klass) In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 14:44:44 GMT, Aleksey Shipilev wrote: > Both overloads for Klass* and Method* were added by JDK-6830717. But Klass* overload was never implemented: > http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/bd7a7ce2e264#l32.934 > > Testing: > - [x] Linux x86_64 build This pull request has now been integrated. Changeset: 7f73474f Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/7f73474f Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8254773: Remove unimplemented ciReplay::is_loaded(Klass* klass) Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/658 From roland at openjdk.java.net Thu Oct 15 06:56:14 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 15 Oct 2020 06:56:14 GMT Subject: RFR: 8254734: "dead loop detected" assert failure with patch from 8223051 In-Reply-To: References: Message-ID: <3C2nUYpcbRbo9EareICY0zgd7Kx-dQRPS1vFbMK0ts4=.52c1f68f-6f6f-47a9-a4b7-a1d556404023@github.com> On Wed, 14 Oct 2020 10:01:59 GMT, Christian Hagedorn wrote: >> compiler/c2/TestDeadDataLoopIGVN.java ran with -server -Xcomp >> -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 >> -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation and >> the patch from 8223051 (long counted loops, not integrated yet) fails >> with: >> >> assert(no_dead_loop) failed: dead loop detected >> >> I can reproduce the failure with only this change from 8223051: >> diff --git a/src/hotspot/share/opto/callnode.cpp b/src/hotspot/share/opto/callnode.cpp >> index 268cec7732c..295cb4ecaf9 100644 >> --- a/src/hotspot/share/opto/callnode.cpp >> +++ b/src/hotspot/share/opto/callnode.cpp >> @@ -1159,7 +1159,7 @@ Node* SafePointNode::Identity(PhaseGVN* phase) { >> if( in(TypeFunc::Control)->is_SafePoint() ) >> return in(TypeFunc::Control); >> >> - if( in(0)->is_Proj() ) { >> + if (in(0)->is_Proj() && !phase->C->major_progress()) { >> Node *n0 = in(0)->in(0); >> // Check if he is a call projection (except Leaf Call) >> if( n0->is_Catch() ) { >> diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp >> index 4b88c379dea..baf2bf9bacc 100644 >> --- a/src/hotspot/share/opto/parse1.cpp >> +++ b/src/hotspot/share/opto/parse1.cpp >> @@ -2254,23 +2254,7 @@ void Parse::return_current(Node* value) { >> >> //------------------------------add_safepoint---------------------------------- >> void Parse::add_safepoint() { >> - // See if we can avoid this safepoint. No need for a SafePoint immediately >> - // after a Call (except Leaf Call) or another SafePoint. >> - Node *proj = control(); >> uint parms = TypeFunc::Parms+1; >> - if( proj->is_Proj() ) { >> - Node *n0 = proj->in(0); >> - if( n0->is_Catch() ) { >> - n0 = n0->in(0)->in(0); >> - assert( n0->is_Call(), "expect a call here" ); >> - } >> - if( n0->is_Call() ) { >> - if( n0->as_Call()->guaranteed_safepoint() ) >> - return; >> - } else if( n0->is_SafePoint() && n0->req() >= parms ) { >> - return; >> - } >> - } >> >> // Clear out dead values from the debug info. >> kill_dead_locals(); >> so it's unrelated to the long counted loops transformation itself. >> >> At IGVN time, a dead loop is optimized. The loop contains the >> following subgraph: >> >> (LoadI .. (AddP (CastPP .. (Phi (Proj (CallStaticJava .. (AddI >> >> The projection is on the backedge of the phi. The AddI points back to >> the LoadI. The CallStaticJava is a boxing method call. The LoadI is a >> boxed value load. >> >> The phi is optimized out as the loop is unreachable: >> >> (LoadI .. (AddP (Proj (CallStaticJava .. (AddI >> >> The LoadI is then optimized by code MemNode::can_see_stored_value(): >> // Load boxed value from result of valueOf() call is input parameter. >> if (this->is_Load() && ld_adr->is_AddP() && >> (tp != NULL) && tp->is_ptr_to_boxed_value()) { >> intptr_t ignore = 0; >> Node* base = AddPNode::Ideal_base_and_offset(ld_adr, phase, ignore); >> BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2(); >> base = bs->step_over_gc_barrier(base); >> if (base != NULL && base->is_Proj() && >> base->as_Proj()->_con == TypeFunc::Parms && >> base->in(0)->is_CallStaticJava() && >> base->in(0)->as_CallStaticJava()->is_boxing_method()) { >> return base->in(0)->in(TypeFunc::Parms); >> } >> } >> to the AddI. Because the AddI has the LoadI as input we end up with a >> dead loop. I propose extending the dead loop safe logic to take the >> pattern with a boxing method load into account. > > That's a reasonable fix. I was thinking about something similar before when I was fixing JDK-8251544. Looks good to me. @chhagedorn @vnkozlov thanks for the review ------------- PR: https://git.openjdk.java.net/jdk/pull/649 From roland at openjdk.java.net Thu Oct 15 06:56:15 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 15 Oct 2020 06:56:15 GMT Subject: Integrated: 8254734: "dead loop detected" assert failure with patch from 8223051 In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 07:52:58 GMT, Roland Westrelin wrote: > compiler/c2/TestDeadDataLoopIGVN.java ran with -server -Xcomp > -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 > -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation and > the patch from 8223051 (long counted loops, not integrated yet) fails > with: > > assert(no_dead_loop) failed: dead loop detected > > I can reproduce the failure with only this change from 8223051: > diff --git a/src/hotspot/share/opto/callnode.cpp b/src/hotspot/share/opto/callnode.cpp > index 268cec7732c..295cb4ecaf9 100644 > --- a/src/hotspot/share/opto/callnode.cpp > +++ b/src/hotspot/share/opto/callnode.cpp > @@ -1159,7 +1159,7 @@ Node* SafePointNode::Identity(PhaseGVN* phase) { > if( in(TypeFunc::Control)->is_SafePoint() ) > return in(TypeFunc::Control); > > - if( in(0)->is_Proj() ) { > + if (in(0)->is_Proj() && !phase->C->major_progress()) { > Node *n0 = in(0)->in(0); > // Check if he is a call projection (except Leaf Call) > if( n0->is_Catch() ) { > diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp > index 4b88c379dea..baf2bf9bacc 100644 > --- a/src/hotspot/share/opto/parse1.cpp > +++ b/src/hotspot/share/opto/parse1.cpp > @@ -2254,23 +2254,7 @@ void Parse::return_current(Node* value) { > > //------------------------------add_safepoint---------------------------------- > void Parse::add_safepoint() { > - // See if we can avoid this safepoint. No need for a SafePoint immediately > - // after a Call (except Leaf Call) or another SafePoint. > - Node *proj = control(); > uint parms = TypeFunc::Parms+1; > - if( proj->is_Proj() ) { > - Node *n0 = proj->in(0); > - if( n0->is_Catch() ) { > - n0 = n0->in(0)->in(0); > - assert( n0->is_Call(), "expect a call here" ); > - } > - if( n0->is_Call() ) { > - if( n0->as_Call()->guaranteed_safepoint() ) > - return; > - } else if( n0->is_SafePoint() && n0->req() >= parms ) { > - return; > - } > - } > > // Clear out dead values from the debug info. > kill_dead_locals(); > so it's unrelated to the long counted loops transformation itself. > > At IGVN time, a dead loop is optimized. The loop contains the > following subgraph: > > (LoadI .. (AddP (CastPP .. (Phi (Proj (CallStaticJava .. (AddI > > The projection is on the backedge of the phi. The AddI points back to > the LoadI. The CallStaticJava is a boxing method call. The LoadI is a > boxed value load. > > The phi is optimized out as the loop is unreachable: > > (LoadI .. (AddP (Proj (CallStaticJava .. (AddI > > The LoadI is then optimized by code MemNode::can_see_stored_value(): > // Load boxed value from result of valueOf() call is input parameter. > if (this->is_Load() && ld_adr->is_AddP() && > (tp != NULL) && tp->is_ptr_to_boxed_value()) { > intptr_t ignore = 0; > Node* base = AddPNode::Ideal_base_and_offset(ld_adr, phase, ignore); > BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2(); > base = bs->step_over_gc_barrier(base); > if (base != NULL && base->is_Proj() && > base->as_Proj()->_con == TypeFunc::Parms && > base->in(0)->is_CallStaticJava() && > base->in(0)->as_CallStaticJava()->is_boxing_method()) { > return base->in(0)->in(TypeFunc::Parms); > } > } > to the AddI. Because the AddI has the LoadI as input we end up with a > dead loop. I propose extending the dead loop safe logic to take the > pattern with a boxing method load into account. This pull request has now been integrated. Changeset: f44fc6de Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/f44fc6de Stats: 28 lines in 2 files changed: 23 ins; 4 del; 1 mod 8254734: "dead loop detected" assert failure with patch from 8223051 Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/649 From vlivanov at openjdk.java.net Thu Oct 15 10:40:16 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 15 Oct 2020 10:40:16 GMT Subject: RFR: 8254814: [Vector API] Fix an AVX512 crash after JDK-8223347 In-Reply-To: <9yFclJuYNDD8ZUHHTI5OazZPR4sMcvzScmXgXxyIqTU=.e46792cb-bdb9-4a24-9f52-b5952f7df23d@github.com> References: <9yFclJuYNDD8ZUHHTI5OazZPR4sMcvzScmXgXxyIqTU=.e46792cb-bdb9-4a24-9f52-b5952f7df23d@github.com> Message-ID: On Thu, 15 Oct 2020 06:22:28 GMT, Jie Fu wrote: > As discussed here[1], it's time to integrate this fix. > And the original PR is here[2]. > > Testing: > - All jdk/incubator/vector/ tests passed on Linux/x64 AVX512 machines > > [1] https://github.com/openjdk/jdk/pull/367#issuecomment-701560441 > [2] https://github.com/openjdk/panama-vector/pull/1 Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/676 From thartmann at openjdk.java.net Thu Oct 15 13:16:27 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 15 Oct 2020 13:16:27 GMT Subject: RFR: 8251535: Partial peeling at unsigned test adds incorrect loop exit check Message-ID: <8yUxN4k_USTB2kvNX_utmsVliIBuuK2mOibFhiUZXCE=.ce08b033-b2db-456a-90f0-b5b697f81662@github.com> C2's `PhaseIdealLoop::partial_peel` searches for loop exit tests on the induction variable as cut point for partial peeling. If no suitable signed test is found and `PartialPeelAtUnsignedTests` is enabled (default), unsigned `i = limit`, `PhaseIdealLoop::insert_cmpi_loop_exit` either clones the lower or upper bound check and inserts it as cut point before the unsigned test. For example: loop: i += 1000; if (i `i >= 0 && i < limit` because it can't be split into a single signed exit check. Thanks, Tobias ------------- Commit messages: - Removed trailing whitespace - 8251535: Partial peeling at unsigned test adds incorrect loop exit check Changes: https://git.openjdk.java.net/jdk/pull/681/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=681&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8251535 Stats: 192 lines in 2 files changed: 191 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/681.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/681/head:pull/681 PR: https://git.openjdk.java.net/jdk/pull/681 From thartmann at openjdk.java.net Thu Oct 15 13:18:11 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 15 Oct 2020 13:18:11 GMT Subject: RFR: 8254814: [Vector API] Fix an AVX512 crash after JDK-8223347 In-Reply-To: <9yFclJuYNDD8ZUHHTI5OazZPR4sMcvzScmXgXxyIqTU=.e46792cb-bdb9-4a24-9f52-b5952f7df23d@github.com> References: <9yFclJuYNDD8ZUHHTI5OazZPR4sMcvzScmXgXxyIqTU=.e46792cb-bdb9-4a24-9f52-b5952f7df23d@github.com> Message-ID: On Thu, 15 Oct 2020 06:22:28 GMT, Jie Fu wrote: > As discussed here[1], it's time to integrate this fix. > And the original PR is here[2]. > > Testing: > - All jdk/incubator/vector/ tests passed on Linux/x64 AVX512 machines > > [1] https://github.com/openjdk/jdk/pull/367#issuecomment-701560441 > [2] https://github.com/openjdk/panama-vector/pull/1 Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/676 From chagedorn at openjdk.java.net Thu Oct 15 14:05:09 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 15 Oct 2020 14:05:09 GMT Subject: RFR: 8251535: Partial peeling at unsigned test adds incorrect loop exit check In-Reply-To: <8yUxN4k_USTB2kvNX_utmsVliIBuuK2mOibFhiUZXCE=.ce08b033-b2db-456a-90f0-b5b697f81662@github.com> References: <8yUxN4k_USTB2kvNX_utmsVliIBuuK2mOibFhiUZXCE=.ce08b033-b2db-456a-90f0-b5b697f81662@github.com> Message-ID: On Thu, 15 Oct 2020 12:42:53 GMT, Tobias Hartmann wrote: > C2's `PhaseIdealLoop::partial_peel` searches for loop exit tests on the induction variable as cut point for partial > peeling. If no suitable signed test is found and `PartialPeelAtUnsignedTests` is enabled (default), unsigned `i limit` checks are used. Since the exit condition `!(i = limit`, > `PhaseIdealLoop::insert_cmpi_loop_exit` either clones the lower or upper bound check and inserts it as cut point before > the unsigned test. For example: > loop: > i += 1000; > if (i goto loop; > } > goto exit; > exit: > return i; > > Is converted to: > > loop: > i += 1000; > if (!(i < 10_000)) { <-- Loop exit test as cut point for partial peeling > goto exit; > } > if (i goto loop; > } > goto exit; > exit: > return i; > > Now the problem is that if the unsigned check is inverted, i.e. we exit if the check **passes**, the newly inserted > test is incorrect: > > loop: > i += 1000; > if (i goto exit; > } > goto loop; > exit: > return i; > > Is converted to: > > loop: > i += 1000; > if (i < 10_000) { <-- This exit condition is wrong! For example, we should not exit for i = -1. > goto exit; > } > if (i goto exit; > } > goto loop; > exit: > return i; > > This leads to incorrect results because the loop is left too early. > > The fix is to simply bail out when the loop exit condition is `i `i >= 0 && i < limit` because it can't be > split into a single signed exit check. > Thanks, > Tobias Nice explanation! Looks reasonable to me. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/681 From dnsimon at openjdk.java.net Thu Oct 15 14:19:19 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 15 Oct 2020 14:19:19 GMT Subject: RFR: 8254842: [JVMCI] copy thread name when attaching libgraal thread to HotSpot Message-ID: This PR modifies `HotSpotJVMCIRuntime.attachCurrentThread` when it is called from within libgraal so that the name of the thread in the libgraal heap is used as the name of the peer thread created in HotSpot. This useful when viewing output such as `-XX:JVMCITraceLevel=1`. For example, here's sample output without this PR: JVMCITrace-1[Thread-0]: initializing JVMCI runtime -1 JVMCITrace-1[Thread-0]: initialized JVMCI runtime -1 and then with: JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initializing JVMCI runtime -1 JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initialized JVMCI runtime -1 ------------- Commit messages: - 8254842: [JVMCI] copy thread name when attaching libgraal thread to HotSpot Changes: https://git.openjdk.java.net/jdk/pull/684/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=684&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254842 Stats: 18 lines in 3 files changed: 12 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/684.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/684/head:pull/684 PR: https://git.openjdk.java.net/jdk/pull/684 From neliasso at openjdk.java.net Thu Oct 15 14:21:16 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 15 Oct 2020 14:21:16 GMT Subject: RFR: 8251535: Partial peeling at unsigned test adds incorrect loop exit check In-Reply-To: <8yUxN4k_USTB2kvNX_utmsVliIBuuK2mOibFhiUZXCE=.ce08b033-b2db-456a-90f0-b5b697f81662@github.com> References: <8yUxN4k_USTB2kvNX_utmsVliIBuuK2mOibFhiUZXCE=.ce08b033-b2db-456a-90f0-b5b697f81662@github.com> Message-ID: On Thu, 15 Oct 2020 12:42:53 GMT, Tobias Hartmann wrote: > C2's `PhaseIdealLoop::partial_peel` searches for loop exit tests on the induction variable as cut point for partial > peeling. If no suitable signed test is found and `PartialPeelAtUnsignedTests` is enabled (default), unsigned `i limit` checks are used. Since the exit condition `!(i = limit`, > `PhaseIdealLoop::insert_cmpi_loop_exit` either clones the lower or upper bound check and inserts it as cut point before > the unsigned test. For example: > loop: > i += 1000; > if (i goto loop; > } > goto exit; > exit: > return i; > > Is converted to: > > loop: > i += 1000; > if (!(i < 10_000)) { <-- Loop exit test as cut point for partial peeling > goto exit; > } > if (i goto loop; > } > goto exit; > exit: > return i; > > Now the problem is that if the unsigned check is inverted, i.e. we exit if the check **passes**, the newly inserted > test is incorrect: > > loop: > i += 1000; > if (i goto exit; > } > goto loop; > exit: > return i; > > Is converted to: > > loop: > i += 1000; > if (i < 10_000) { <-- This exit condition is wrong! For example, we should not exit for i = -1. > goto exit; > } > if (i goto exit; > } > goto loop; > exit: > return i; > > This leads to incorrect results because the loop is left too early. > > The fix is to simply bail out when the loop exit condition is `i `i >= 0 && i < limit` because it can't be > split into a single signed exit check. > Thanks, > Tobias Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/681 From thartmann at openjdk.java.net Thu Oct 15 14:35:13 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 15 Oct 2020 14:35:13 GMT Subject: RFR: 8251535: Partial peeling at unsigned test adds incorrect loop exit check In-Reply-To: References: <8yUxN4k_USTB2kvNX_utmsVliIBuuK2mOibFhiUZXCE=.ce08b033-b2db-456a-90f0-b5b697f81662@github.com> Message-ID: On Thu, 15 Oct 2020 14:18:47 GMT, Nils Eliasson wrote: >> C2's `PhaseIdealLoop::partial_peel` searches for loop exit tests on the induction variable as cut point for partial >> peeling. If no suitable signed test is found and `PartialPeelAtUnsignedTests` is enabled (default), unsigned `i > limit` checks are used. Since the exit condition `!(i = limit`, >> `PhaseIdealLoop::insert_cmpi_loop_exit` either clones the lower or upper bound check and inserts it as cut point before >> the unsigned test. For example: >> loop: >> i += 1000; >> if (i > goto loop; >> } >> goto exit; >> exit: >> return i; >> >> Is converted to: >> >> loop: >> i += 1000; >> if (!(i < 10_000)) { <-- Loop exit test as cut point for partial peeling >> goto exit; >> } >> if (i > goto loop; >> } >> goto exit; >> exit: >> return i; >> >> Now the problem is that if the unsigned check is inverted, i.e. we exit if the check **passes**, the newly inserted >> test is incorrect: >> >> loop: >> i += 1000; >> if (i > goto exit; >> } >> goto loop; >> exit: >> return i; >> >> Is converted to: >> >> loop: >> i += 1000; >> if (i < 10_000) { <-- This exit condition is wrong! For example, we should not exit for i = -1. >> goto exit; >> } >> if (i > goto exit; >> } >> goto loop; >> exit: >> return i; >> >> This leads to incorrect results because the loop is left too early. >> >> The fix is to simply bail out when the loop exit condition is `i `i >= 0 && i < limit` because it can't be >> split into a single signed exit check. >> Thanks, >> Tobias > > Looks good! Thanks Christian and Nils! ------------- PR: https://git.openjdk.java.net/jdk/pull/681 From neliasso at openjdk.java.net Thu Oct 15 15:07:13 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 15 Oct 2020 15:07:13 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v3] In-Reply-To: <94qadtiTzSkdsJAc_8IWrLxpBvmfiBXMf_W9Z965P80=.9a59a5db-2209-4007-94bb-16ccd8ff0b77@github.com> References: <94qadtiTzSkdsJAc_8IWrLxpBvmfiBXMf_W9Z965P80=.9a59a5db-2209-4007-94bb-16ccd8ff0b77@github.com> Message-ID: <9lPNMo1V33tQD6qp-1l78dII5Hfle8Ea5VWwuY1l_qA=.2e420c11-6e70-41f8-80b4-5992dcdd02eb@github.com> On Tue, 13 Oct 2020 18:03:27 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 >> bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized >> instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag >> ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. 4) Based on the >> perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types >> (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : >> [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : >> [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Replacing explicit type checks with existing type checking routines Changes requested by neliasso (Reviewer). src/hotspot/share/opto/macroArrayCopy.cpp line 859: > 857: #ifdef ASSERT > 858: const TypeOopPtr* dest_t = _igvn.type(dest)->is_oopptr(); > 859: if (dest_t->is_known_instance() && false == is_partial_array_copy) { "false == is_partial_array_copy" change to (!is_partial_array_copy) src/hotspot/share/opto/memnode.hpp line 1188: > 1186: TrailingLoadStore, > 1187: LeadingLoadStore, > 1188: AfterPartialArrayCopy Change to keep consistent with the other names: AfterPartialArrayCopy -> TrailingPartialArrayCopy Why is a special kind needed for partial array copy? ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From burban at openjdk.java.net Thu Oct 15 15:07:18 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Thu, 15 Oct 2020 15:07:18 GMT Subject: RFR: 8254827: JVMCI: Enable it for Windows+AArch64 Message-ID: Use r18 as allocatable register on Linux only. A bootstrap works now (it has been crashing before due to r18 being allocated): $ ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version Bootstrapping JVMCI................................. in 17990 ms (compiled 3330 methods) openjdk version "16-internal" 2021-03-16 OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. ------------- Commit messages: - 8254827: JVMCI: Enable it for Windows+AArch64 Changes: https://git.openjdk.java.net/jdk/pull/685/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=685&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254827 Stats: 15 lines in 3 files changed: 8 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/685.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/685/head:pull/685 PR: https://git.openjdk.java.net/jdk/pull/685 From xliu at openjdk.java.net Thu Oct 15 16:01:23 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 15 Oct 2020 16:01:23 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences [v2] In-Reply-To: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> Message-ID: <5y1OMvYVYTPb4KUF0hEnzTYVUtXwwKq3MkKMe3r5p04=.f4d349f7-c6de-4983-86ed-1efa95d66181@github.com> > 8254369: Node::disconnect_inputs may skip precedences Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8254369: Node::disconnect_inputs may skip precedences use ASSERT for the sanity check ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/664/files - new: https://git.openjdk.java.net/jdk/pull/664/files/f3f9c27b..bc809ffc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=664&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=664&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/664.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/664/head:pull/664 PR: https://git.openjdk.java.net/jdk/pull/664 From psandoz at openjdk.java.net Thu Oct 15 16:07:31 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 15 Oct 2020 16:07:31 GMT Subject: RFR: 8254814: [Vector API] Fix an AVX512 crash after JDK-8223347 In-Reply-To: <9yFclJuYNDD8ZUHHTI5OazZPR4sMcvzScmXgXxyIqTU=.e46792cb-bdb9-4a24-9f52-b5952f7df23d@github.com> References: <9yFclJuYNDD8ZUHHTI5OazZPR4sMcvzScmXgXxyIqTU=.e46792cb-bdb9-4a24-9f52-b5952f7df23d@github.com> Message-ID: On Thu, 15 Oct 2020 06:22:28 GMT, Jie Fu wrote: > As discussed here[1], it's time to integrate this fix. > And the original PR is here[2]. > > Testing: > - All jdk/incubator/vector/ tests passed on Linux/x64 AVX512 machines > > [1] https://github.com/openjdk/jdk/pull/367#issuecomment-701560441 > [2] https://github.com/openjdk/panama-vector/pull/1 Marked as reviewed by psandoz (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/676 From redestad at openjdk.java.net Thu Oct 15 17:36:09 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 15 Oct 2020 17:36:09 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences [v2] In-Reply-To: <5y1OMvYVYTPb4KUF0hEnzTYVUtXwwKq3MkKMe3r5p04=.f4d349f7-c6de-4983-86ed-1efa95d66181@github.com> References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> <5y1OMvYVYTPb4KUF0hEnzTYVUtXwwKq3MkKMe3r5p04=.f4d349f7-c6de-4983-86ed-1efa95d66181@github.com> Message-ID: On Thu, 15 Oct 2020 16:01:23 GMT, Xin Liu wrote: >> 8254369: Node::disconnect_inputs may skip precedences > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8254369: Node::disconnect_inputs may skip precedences > > use ASSERT for the sanity check Looks good, whether you decide to incorporate my suggestion below or not ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/664 From redestad at openjdk.java.net Thu Oct 15 17:36:10 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 15 Oct 2020 17:36:10 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences [v2] In-Reply-To: <6KDyUMTH9NnDNZzaAFA_9jtmsVW47OPeNqfucRtq2_U=.b7c283c1-2f0e-48b8-a748-3ebd9475fdea@github.com> References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> <4tKcqdZXXAmHA6zzDeFa4Mvei7x6rYJq2HUU2xVlD2Y=.cd967efa-009e-4dae-822d-b3554362356b@github.com> <6KDyUMTH9NnDNZzaAFA_9jtmsVW47OPeNqfucRtq2_U=.b7c283c1-2f0e-48b8-a748-3ebd9475fdea@github.com> Message-ID: <7653rvxIx1cj4vqLdGrUDvPdGkV602ibiPH7Cz0KTuc=.ae517fea-0ec8-4126-9dc0-35f91fa80ef1@github.com> On Thu, 15 Oct 2020 00:19:34 GMT, Xin Liu wrote: >> I got now why you need to scan reverse. > > hi, @vnkozlov , > Thank you to review it. > > i 0. I think you could write it like this to hoist the `i < len()` test from the loop while avoiding issues when `len()` or `req()` is 0: uint i = len(); if (i > 0) { while(i > req()) { rm_prec(--i); } } ------------- PR: https://git.openjdk.java.net/jdk/pull/664 From kvn at openjdk.java.net Thu Oct 15 18:22:14 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 15 Oct 2020 18:22:14 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences [v2] In-Reply-To: <7653rvxIx1cj4vqLdGrUDvPdGkV602ibiPH7Cz0KTuc=.ae517fea-0ec8-4126-9dc0-35f91fa80ef1@github.com> References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> <4tKcqdZXXAmHA6zzDeFa4Mvei7x6rYJq2HUU2xVlD2Y=.cd967efa-009e-4dae-822d-b3554362356b@github.com> <6KDyUMTH9NnDNZzaAFA_9jtmsVW47OPeNqfucRtq2_U=.b7c283c1-2f0e-48b8-a748-3ebd9475fdea@github.com> <7653rvxIx1cj4vqLdGrUDvPdGkV602ibiPH7Cz0KTuc=.ae517fea-0ec8-4126-9dc0-35f91fa80ef1@github.com> Message-ID: On Thu, 15 Oct 2020 17:29:48 GMT, Claes Redestad wrote: >> hi, @vnkozlov , >> Thank you to review it. >> >> i> 0. > > I think you could write it like this to hoist the `i < len()` test from the loop while avoiding issues when `len()` or > `req()` is 0: > uint i = len(); > if (i > 0) { > while(i > req()) { > rm_prec(--i); > } > } Nice suggestion! ------------- PR: https://git.openjdk.java.net/jdk/pull/664 From kvn at openjdk.java.net Thu Oct 15 19:14:17 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 15 Oct 2020 19:14:17 GMT Subject: RFR: 8251535: Partial peeling at unsigned test adds incorrect loop exit check In-Reply-To: <8yUxN4k_USTB2kvNX_utmsVliIBuuK2mOibFhiUZXCE=.ce08b033-b2db-456a-90f0-b5b697f81662@github.com> References: <8yUxN4k_USTB2kvNX_utmsVliIBuuK2mOibFhiUZXCE=.ce08b033-b2db-456a-90f0-b5b697f81662@github.com> Message-ID: On Thu, 15 Oct 2020 12:42:53 GMT, Tobias Hartmann wrote: > C2's `PhaseIdealLoop::partial_peel` searches for loop exit tests on the induction variable as cut point for partial > peeling. If no suitable signed test is found and `PartialPeelAtUnsignedTests` is enabled (default), unsigned `i limit` checks are used. Since the exit condition `!(i = limit`, > `PhaseIdealLoop::insert_cmpi_loop_exit` either clones the lower or upper bound check and inserts it as cut point before > the unsigned test. For example: > loop: > i += 1000; > if (i goto loop; > } > goto exit; > exit: > return i; > > Is converted to: > > loop: > i += 1000; > if (!(i < 10_000)) { <-- Loop exit test as cut point for partial peeling > goto exit; > } > if (i goto loop; > } > goto exit; > exit: > return i; > > Now the problem is that if the unsigned check is inverted, i.e. we exit if the check **passes**, the newly inserted > test is incorrect: > > loop: > i += 1000; > if (i goto exit; > } > goto loop; > exit: > return i; > > Is converted to: > > loop: > i += 1000; > if (i < 10_000) { <-- This exit condition is wrong! For example, we should not exit for i = -1. > goto exit; > } > if (i goto exit; > } > goto loop; > exit: > return i; > > This leads to incorrect results because the loop is left too early. > > The fix is to simply bail out when the loop exit condition is `i `i >= 0 && i < limit` because it can't be > split into a single signed exit check. > Thanks, > Tobias good ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/681 From kvn at openjdk.java.net Thu Oct 15 19:18:19 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 15 Oct 2020 19:18:19 GMT Subject: RFR: 8254842: [JVMCI] copy thread name when attaching libgraal thread to HotSpot In-Reply-To: References: Message-ID: On Thu, 15 Oct 2020 13:29:39 GMT, Doug Simon wrote: > This PR modifies `HotSpotJVMCIRuntime.attachCurrentThread` when it is called from within libgraal so that the name of > the thread in the libgraal heap is used as the name of the peer thread created in HotSpot. > This useful when viewing output such as `-XX:JVMCITraceLevel=1`. For example, here's sample output without this PR: > JVMCITrace-1[Thread-0]: initializing JVMCI runtime -1 > JVMCITrace-1[Thread-0]: initialized JVMCI runtime -1 > and then with: > JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initializing JVMCI runtime -1 > JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initialized JVMCI runtime -1 src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2358: > 2356: JNIEnv* hotspotEnv; > 2357: > 2358: int name_len = env->GetArrayLength(name); `name` could be NULL based on code in attachCurrentThread. Should we check for NULL here? ------------- PR: https://git.openjdk.java.net/jdk/pull/684 From vladimir.kozlov at oracle.com Thu Oct 15 19:24:59 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 15 Oct 2020 12:24:59 -0700 Subject: Howto replicate failure of 8254790? In-Reply-To: <661485ab-7de7-26cb-b2b1-3a4f643125eb@oracle.com> References: <4bd7e9f73ea24ae09f1bb0f1808ce5a7@EX13D46EUB003.ant.amazon.com> <661485ab-7de7-26cb-b2b1-3a4f643125eb@oracle.com> Message-ID: <617f010e-629d-7329-ac72-dce797bf3075@oracle.com> Note, we have old Mac machines in our testing env: cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, lzcnt, tsc, tscinvbit, avx, avx2, aes, erms, clmul, bmi1, bmi2, rtm, adx, fma, vzeroupper, clflush, clflushopt Use -XX:UseAVX=2 But I was not able reproduce failure on my Skylake Linux machine even with -XX:UseAVX=2. Maybe there are other factors on MacOS. Regards, Vladimir K On 10/14/20 5:48 PM, David Holmes wrote: > Hi Jason, > > On 15/10/2020 10:42 am, Tatton, Jason wrote: >> Hi all, >> >> >> >> I am trying to replicate the failure of the tier2 test mentioned in >> 8254790 but I am only seeing it pass under an x86 linux machine. Are >> there any specific architectural constraints under which this test should be run in order to make it fail? > > It failed on a Mac, not Linux. > > Cheers, > David > >> >> >> I am running the test via: make test TEST="test/jdk/javax/xml/crypto/dsig/GenerationTests.java" >> >> >> >> Note that I am running the test against master without the commit: "8254792: Disable intrinsic StringLatin1.indexOf >> until 8254790 is fixed" which disables the intrinsic that is causing the test to fail. >> >> >> >> Thanks >> -- >> Jason >> From dnsimon at openjdk.java.net Thu Oct 15 19:26:12 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 15 Oct 2020 19:26:12 GMT Subject: RFR: 8254842: [JVMCI] copy thread name when attaching libgraal thread to HotSpot In-Reply-To: References: Message-ID: On Thu, 15 Oct 2020 19:15:44 GMT, Vladimir Kozlov wrote: >> This PR modifies `HotSpotJVMCIRuntime.attachCurrentThread` when it is called from within libgraal so that the name of >> the thread in the libgraal heap is used as the name of the peer thread created in HotSpot. >> This useful when viewing output such as `-XX:JVMCITraceLevel=1`. For example, here's sample output without this PR: >> JVMCITrace-1[Thread-0]: initializing JVMCI runtime -1 >> JVMCITrace-1[Thread-0]: initialized JVMCI runtime -1 >> and then with: >> JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initializing JVMCI runtime -1 >> JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initialized JVMCI runtime -1 > > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2358: > >> 2356: JNIEnv* hotspotEnv; >> 2357: >> 2358: int name_len = env->GetArrayLength(name); > > `name` could be NULL based on code in attachCurrentThread. Should we check for NULL here? It's not obvious, but the test against `IS_IN_NATIVE_IMAGE` in `HotSpotJVMCIRuntime.attachCurrentThread` is equivalent to the `thread == NULL` test here. I'll add a comment (and maybe a `guarantee`) to clarify this. ------------- PR: https://git.openjdk.java.net/jdk/pull/684 From jptatton at amazon.com Thu Oct 15 21:10:24 2020 From: jptatton at amazon.com (Tatton, Jason) Date: Thu, 15 Oct 2020 21:10:24 +0000 Subject: Howto replicate failure of 8254790? In-Reply-To: <617f010e-629d-7329-ac72-dce797bf3075@oracle.com> References: <4bd7e9f73ea24ae09f1bb0f1808ce5a7@EX13D46EUB003.ant.amazon.com> <661485ab-7de7-26cb-b2b1-3a4f643125eb@oracle.com> <617f010e-629d-7329-ac72-dce797bf3075@oracle.com> Message-ID: Thanks Vladimir and David, I have access to a new macbook with an Intel i7-9750H (supports AVX2) so I will try on that. -----Original Message----- From: Vladimir Kozlov Sent: 15 October 2020 20:25 To: David Holmes ; Tatton, Jason ; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net Subject: RE: [EXTERNAL] Howto replicate failure of 8254790? CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Note, we have old Mac machines in our testing env: cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, lzcnt, tsc, tscinvbit, avx, avx2, aes, erms, clmul, bmi1, bmi2, rtm, adx, fma, vzeroupper, clflush, clflushopt Use -XX:UseAVX=2 But I was not able reproduce failure on my Skylake Linux machine even with -XX:UseAVX=2. Maybe there are other factors on MacOS. Regards, Vladimir K On 10/14/20 5:48 PM, David Holmes wrote: > Hi Jason, > > On 15/10/2020 10:42 am, Tatton, Jason wrote: >> Hi all, >> >> >> >> I am trying to replicate the failure of the tier2 test mentioned in >> 8254790 but I am >> only seeing it pass under an x86 linux machine. Are there any specific architectural constraints under which this test should be run in order to make it fail? > > It failed on a Mac, not Linux. > > Cheers, > David > >> >> >> I am running the test via: make test TEST="test/jdk/javax/xml/crypto/dsig/GenerationTests.java" >> >> >> >> Note that I am running the test against master without the commit: >> "8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed" which disables the intrinsic that is causing the test to fail. >> >> >> >> Thanks >> -- >> Jason >> From xliu at openjdk.java.net Thu Oct 15 22:14:22 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 15 Oct 2020 22:14:22 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences [v3] In-Reply-To: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> Message-ID: > 8254369: Node::disconnect_inputs may skip precedences Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8254369: Node::disconnect_inputs may skip precedences ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/664/files - new: https://git.openjdk.java.net/jdk/pull/664/files/bc809ffc..58460574 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=664&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=664&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/664.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/664/head:pull/664 PR: https://git.openjdk.java.net/jdk/pull/664 From redestad at openjdk.java.net Thu Oct 15 22:25:16 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 15 Oct 2020 22:25:16 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences [v3] In-Reply-To: References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> Message-ID: On Thu, 15 Oct 2020 22:14:22 GMT, Xin Liu wrote: >> 8254369: Node::disconnect_inputs may skip precedences > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8254369: Node::disconnect_inputs may skip precedences "where's the if(len() > 0)" I thought for a second, but you're right: len() >= req(), so if len() is 0, then req() must be 0, too, which means len() > req() is false and we won't accidentally call rm_prec(UINT_MAX) ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/664 From kvn at openjdk.java.net Thu Oct 15 22:25:16 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 15 Oct 2020 22:25:16 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences [v3] In-Reply-To: References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> Message-ID: On Thu, 15 Oct 2020 22:14:22 GMT, Xin Liu wrote: >> 8254369: Node::disconnect_inputs may skip precedences > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8254369: Node::disconnect_inputs may skip precedences Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/664 From xliu at openjdk.java.net Thu Oct 15 22:46:17 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 15 Oct 2020 22:46:17 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences [v3] In-Reply-To: References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> <4tKcqdZXXAmHA6zzDeFa4Mvei7x6rYJq2HUU2xVlD2Y=.cd967efa-009e-4dae-822d-b3554362356b@github.com> <6KDyUMTH9NnDNZzaAFA_9jtmsVW47OPeNqfucRtq2_U=.b7c283c1-2f0e-48b8-a748-3ebd9475fdea@github.com> <7653rvxIx1cj4vqLdGrUDvPdGkV602ibiPH7Cz0KTuc=.ae517fea-0ec8-4126-9dc0-35f91fa80ef1@github.com> Message-ID: <5Gd7UMyu6BTKDsJimUsIUsce3EXLuqe_nfRvK8Ng-V4=.dc98a18e-4d2c-43f4-9134-20c13ebc6b0e@github.com> On Thu, 15 Oct 2020 18:19:42 GMT, Vladimir Kozlov wrote: >> I think you could write it like this to hoist the `i < len()` test from the loop while avoiding issues when `len()` or >> `req()` is 0: >> uint i = len(); >> if (i > 0) { >> while(i > req()) { >> rm_prec(--i); >> } >> } > > Nice suggestion! > "where's the if(len() > 0)" I thought for a second, but you're right: len() >= req(), so if len() is 0, then req() must > be 0, too, which means len() > req() is false and we won't accidentally call rm_prec(UINT_MAX) Yes, that's why I think it's safe to save if (i > 0). Actually, I tried to hoist some corner cases out of loop before, but that made code less straightforward. I give it up because I don't want to win some "performance" at expense of readability and gcc/llvm can do job anyway. Your code rm_prec(--i) is brilliant. thanks! let's see the result. ------------- PR: https://git.openjdk.java.net/jdk/pull/664 From vladimir.kozlov at oracle.com Thu Oct 15 22:59:20 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 15 Oct 2020 15:59:20 -0700 Subject: Howto replicate failure of 8254790? In-Reply-To: References: <4bd7e9f73ea24ae09f1bb0f1808ce5a7@EX13D46EUB003.ant.amazon.com> <661485ab-7de7-26cb-b2b1-3a4f643125eb@oracle.com> <617f010e-629d-7329-ac72-dce797bf3075@oracle.com> Message-ID: Hi Jason, I added surrounding instructions dump from hs_err file we have so you can reconstruct x86 assembler from it. If you look on si_addr: 0x00000000e41d2718 which case memory map failure, it looks like R8 =0x00000007e41d2700 is an oop: [B with upper 32-bits zeroed. It seems uppers 32-bits of address were cut. But I don't see it can happens in stringL_indexof_char() sub. You correctly used movptr() and addptr() instructions. Vladimir K On 10/15/20 2:10 PM, Tatton, Jason wrote: > Thanks Vladimir and David, I have access to a new macbook with an Intel i7-9750H (supports AVX2) so I will try on that. > > -----Original Message----- > From: Vladimir Kozlov > Sent: 15 October 2020 20:25 > To: David Holmes ; Tatton, Jason ; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net > Subject: RE: [EXTERNAL] Howto replicate failure of 8254790? > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > Note, we have old Mac machines in our testing env: > cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, lzcnt, tsc, tscinvbit, avx, avx2, aes, erms, clmul, bmi1, bmi2, rtm, adx, fma, vzeroupper, clflush, clflushopt > > Use -XX:UseAVX=2 > > But I was not able reproduce failure on my Skylake Linux machine even with -XX:UseAVX=2. Maybe there are other factors on MacOS. > > Regards, > Vladimir K > > On 10/14/20 5:48 PM, David Holmes wrote: >> Hi Jason, >> >> On 15/10/2020 10:42 am, Tatton, Jason wrote: >>> Hi all, >>> >>> >>> >>> I am trying to replicate the failure of the tier2 test mentioned in >>> 8254790 but I am >>> only seeing it pass under an x86 linux machine. Are there any specific architectural constraints under which this test should be run in order to make it fail? >> >> It failed on a Mac, not Linux. >> >> Cheers, >> David >> >>> >>> >>> I am running the test via: make test TEST="test/jdk/javax/xml/crypto/dsig/GenerationTests.java" >>> >>> >>> >>> Note that I am running the test against master without the commit: >>> "8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed" which disables the intrinsic that is causing the test to fail. >>> >>> >>> >>> Thanks >>> -- >>> Jason >>> From kvn at openjdk.java.net Fri Oct 16 02:02:14 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 16 Oct 2020 02:02:14 GMT Subject: RFR: 8254369: Node::disconnect_inputs may skip precedences [v3] In-Reply-To: References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> Message-ID: On Thu, 15 Oct 2020 22:22:27 GMT, Vladimir Kozlov wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8254369: Node::disconnect_inputs may skip precedences > > Good. hs-tier1 passed ------------- PR: https://git.openjdk.java.net/jdk/pull/664 From xliu at openjdk.java.net Fri Oct 16 02:02:15 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 16 Oct 2020 02:02:15 GMT Subject: Integrated: 8254369: Node::disconnect_inputs may skip precedences In-Reply-To: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> References: <6V3fsxH8JB6tTos4TqibCVyWFnSPieFxp6VSwm-im3I=.60e92bca-62ab-435e-bc05-fcdc0b23555d@github.com> Message-ID: On Wed, 14 Oct 2020 16:04:32 GMT, Xin Liu wrote: > 8254369: Node::disconnect_inputs may skip precedences This pull request has now been integrated. Changeset: bdda2058 Author: Xin Liu Committer: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/bdda2058 Stats: 18 lines in 2 files changed: 15 ins; 0 del; 3 mod 8254369: Node::disconnect_inputs may skip precedences disconnect_inputs() needs to iterate precedences edges in reverse order because rm_prec(i) may backfill _in[i] with a value afterward. also remove the predicate if (n != NULL) in set_prec because it's always true. Reviewed-by: kvn, redestad ------------- PR: https://git.openjdk.java.net/jdk/pull/664 From thartmann at openjdk.java.net Fri Oct 16 06:29:11 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 16 Oct 2020 06:29:11 GMT Subject: RFR: 8251535: Partial peeling at unsigned test adds incorrect loop exit check In-Reply-To: References: <8yUxN4k_USTB2kvNX_utmsVliIBuuK2mOibFhiUZXCE=.ce08b033-b2db-456a-90f0-b5b697f81662@github.com> Message-ID: <66cEPo5_js5tACHCVrdh8O-_CM_tYHenfTM6ImdiILo=.af3dae04-5908-4717-ba3a-bb9bd5b99770@github.com> On Thu, 15 Oct 2020 19:11:22 GMT, Vladimir Kozlov wrote: >> C2's `PhaseIdealLoop::partial_peel` searches for loop exit tests on the induction variable as cut point for partial >> peeling. If no suitable signed test is found and `PartialPeelAtUnsignedTests` is enabled (default), unsigned `i > limit` checks are used. Since the exit condition `!(i = limit`, >> `PhaseIdealLoop::insert_cmpi_loop_exit` either clones the lower or upper bound check and inserts it as cut point before >> the unsigned test. For example: >> loop: >> i += 1000; >> if (i > goto loop; >> } >> goto exit; >> exit: >> return i; >> >> Is converted to: >> >> loop: >> i += 1000; >> if (!(i < 10_000)) { <-- Loop exit test as cut point for partial peeling >> goto exit; >> } >> if (i > goto loop; >> } >> goto exit; >> exit: >> return i; >> >> Now the problem is that if the unsigned check is inverted, i.e. we exit if the check **passes**, the newly inserted >> test is incorrect: >> >> loop: >> i += 1000; >> if (i > goto exit; >> } >> goto loop; >> exit: >> return i; >> >> Is converted to: >> >> loop: >> i += 1000; >> if (i < 10_000) { <-- This exit condition is wrong! For example, we should not exit for i = -1. >> goto exit; >> } >> if (i > goto exit; >> } >> goto loop; >> exit: >> return i; >> >> This leads to incorrect results because the loop is left too early. >> >> The fix is to simply bail out when the loop exit condition is `i `i >= 0 && i < limit` because it can't be >> split into a single signed exit check. >> Thanks, >> Tobias > > good Thanks Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/681 From thartmann at openjdk.java.net Fri Oct 16 06:29:13 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 16 Oct 2020 06:29:13 GMT Subject: Integrated: 8251535: Partial peeling at unsigned test adds incorrect loop exit check In-Reply-To: <8yUxN4k_USTB2kvNX_utmsVliIBuuK2mOibFhiUZXCE=.ce08b033-b2db-456a-90f0-b5b697f81662@github.com> References: <8yUxN4k_USTB2kvNX_utmsVliIBuuK2mOibFhiUZXCE=.ce08b033-b2db-456a-90f0-b5b697f81662@github.com> Message-ID: On Thu, 15 Oct 2020 12:42:53 GMT, Tobias Hartmann wrote: > C2's `PhaseIdealLoop::partial_peel` searches for loop exit tests on the induction variable as cut point for partial > peeling. If no suitable signed test is found and `PartialPeelAtUnsignedTests` is enabled (default), unsigned `i limit` checks are used. Since the exit condition `!(i = limit`, > `PhaseIdealLoop::insert_cmpi_loop_exit` either clones the lower or upper bound check and inserts it as cut point before > the unsigned test. For example: > loop: > i += 1000; > if (i goto loop; > } > goto exit; > exit: > return i; > > Is converted to: > > loop: > i += 1000; > if (!(i < 10_000)) { <-- Loop exit test as cut point for partial peeling > goto exit; > } > if (i goto loop; > } > goto exit; > exit: > return i; > > Now the problem is that if the unsigned check is inverted, i.e. we exit if the check **passes**, the newly inserted > test is incorrect: > > loop: > i += 1000; > if (i goto exit; > } > goto loop; > exit: > return i; > > Is converted to: > > loop: > i += 1000; > if (i < 10_000) { <-- This exit condition is wrong! For example, we should not exit for i = -1. > goto exit; > } > if (i goto exit; > } > goto loop; > exit: > return i; > > This leads to incorrect results because the loop is left too early. > > The fix is to simply bail out when the loop exit condition is `i `i >= 0 && i < limit` because it can't be > split into a single signed exit check. > Thanks, > Tobias This pull request has now been integrated. Changeset: 7c0d4170 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/7c0d4170 Stats: 192 lines in 2 files changed: 191 ins; 0 del; 1 mod 8251535: Partial peeling at unsigned test adds incorrect loop exit check Reviewed-by: chagedorn, neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/681 From jiefu at openjdk.java.net Fri Oct 16 07:00:13 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 16 Oct 2020 07:00:13 GMT Subject: RFR: 8254814: [Vector API] Fix an AVX512 crash after JDK-8223347 In-Reply-To: References: <9yFclJuYNDD8ZUHHTI5OazZPR4sMcvzScmXgXxyIqTU=.e46792cb-bdb9-4a24-9f52-b5952f7df23d@github.com> Message-ID: <2sg9sv0qtqHBn7hNMU_m1-XNhnBER2bb1RwGlW7MtRs=.e75f5d97-92fe-45f2-90ee-d35d4e67dd48@github.com> On Thu, 15 Oct 2020 16:04:22 GMT, Paul Sandoz wrote: >> As discussed here[1], it's time to integrate this fix. >> And the original PR is here[2]. >> >> Testing: >> - All jdk/incubator/vector/ tests passed on Linux/x64 AVX512 machines >> >> [1] https://github.com/openjdk/jdk/pull/367#issuecomment-701560441 >> [2] https://github.com/openjdk/panama-vector/pull/1 > > Marked as reviewed by psandoz (Reviewer). Thanks @iwanowww , @TobiHartmann and @PaulSandoz for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/676 From jiefu at openjdk.java.net Fri Oct 16 07:00:15 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 16 Oct 2020 07:00:15 GMT Subject: Integrated: 8254814: [Vector API] Fix an AVX512 crash after JDK-8223347 In-Reply-To: <9yFclJuYNDD8ZUHHTI5OazZPR4sMcvzScmXgXxyIqTU=.e46792cb-bdb9-4a24-9f52-b5952f7df23d@github.com> References: <9yFclJuYNDD8ZUHHTI5OazZPR4sMcvzScmXgXxyIqTU=.e46792cb-bdb9-4a24-9f52-b5952f7df23d@github.com> Message-ID: On Thu, 15 Oct 2020 06:22:28 GMT, Jie Fu wrote: > As discussed here[1], it's time to integrate this fix. > And the original PR is here[2]. > > Testing: > - All jdk/incubator/vector/ tests passed on Linux/x64 AVX512 machines > > [1] https://github.com/openjdk/jdk/pull/367#issuecomment-701560441 > [2] https://github.com/openjdk/panama-vector/pull/1 This pull request has now been integrated. Changeset: 3d23bd8e Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/3d23bd8e Stats: 22 lines in 1 file changed: 0 ins; 20 del; 2 mod 8254814: [Vector API] Fix an AVX512 crash after JDK-8223347 Reviewed-by: vlivanov, thartmann, psandoz ------------- PR: https://git.openjdk.java.net/jdk/pull/676 From github.com+8792647+robcasloz at openjdk.java.net Fri Oct 16 07:09:19 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 Oct 2020 07:09:19 GMT Subject: RFR: 8254805: compiler/debug/TestStressCM.java is still failing Message-ID: Use the code motion trace produced by `TraceOptoPipelining` (excluding traces of stubs) to assert that two compilations with the same seed cause `StressLCM` and `StressGCM` to take the same randomized decisions. Previously, the entire output produced by `PrintOptoStatistics` was used instead, which has shown to be too fragile. Also, disable inlining in both `TestStressCM.java` and the similar `TestStressIGVN.java` to prevent flaky behavior, and run both tests for ten different seeds to improve coverage. ------------- Commit messages: - 8254805: compiler/debug/TestStressCM.java is still failing Changes: https://git.openjdk.java.net/jdk/pull/693/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=693&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254805 Stats: 34 lines in 2 files changed: 21 ins; 3 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/693.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/693/head:pull/693 PR: https://git.openjdk.java.net/jdk/pull/693 From github.com+8792647+robcasloz at openjdk.java.net Fri Oct 16 07:09:19 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 Oct 2020 07:09:19 GMT Subject: RFR: 8254805: compiler/debug/TestStressCM.java is still failing In-Reply-To: <0v3gnspZqLqrYzPGlKylyHrIdKMwUukJayw50XUX7Fw=.04b0cee4-dbcb-4cc6-a41d-d7b1f9450ff2@github.com> References: <0v3gnspZqLqrYzPGlKylyHrIdKMwUukJayw50XUX7Fw=.04b0cee4-dbcb-4cc6-a41d-d7b1f9450ff2@github.com> Message-ID: On Fri, 16 Oct 2020 06:58:18 GMT, Roberto Casta?eda Lozano wrote: >> Use the code motion trace produced by `TraceOptoPipelining` (excluding traces of stubs) to assert that two compilations >> with the same seed cause `StressLCM` and `StressGCM` to take the same randomized decisions. Previously, the entire >> output produced by `PrintOptoStatistics` was used instead, which has shown to be too fragile. Also, disable inlining in >> both `TestStressCM.java` and the similar `TestStressIGVN.java` to prevent flaky behavior, and run both tests for ten >> different seeds to improve coverage. > > Use the code motion trace produced by TraceOptoPipelining (excluding traces of > stubs) to assert that two compilations with the same seed cause StressLCM and > StressGCM to take the same randomized decisions. Previously, the entire output > produced by PrintOptoStatistics was used instead, which has shown to be too > fragile. Also, disable inlining in both TestStressCM.java and the similar > TestStressIGVN.java to prevent flaky behavior, and run both tests for ten > different seeds to improve coverage. Tested by running `TestStressCM.java` and `TestStressIGVN.java` 100 times each on windows-x64, linux-x64, linux-aarch64, and macosx-x64 (all debug). ------------- PR: https://git.openjdk.java.net/jdk/pull/693 From github.com+8792647+robcasloz at openjdk.java.net Fri Oct 16 07:09:19 2020 From: github.com+8792647+robcasloz at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 Oct 2020 07:09:19 GMT Subject: RFR: 8254805: compiler/debug/TestStressCM.java is still failing In-Reply-To: References: Message-ID: <0v3gnspZqLqrYzPGlKylyHrIdKMwUukJayw50XUX7Fw=.04b0cee4-dbcb-4cc6-a41d-d7b1f9450ff2@github.com> On Fri, 16 Oct 2020 06:57:33 GMT, Roberto Casta?eda Lozano wrote: > Use the code motion trace produced by `TraceOptoPipelining` (excluding traces of stubs) to assert that two compilations > with the same seed cause `StressLCM` and `StressGCM` to take the same randomized decisions. Previously, the entire > output produced by `PrintOptoStatistics` was used instead, which has shown to be too fragile. Also, disable inlining in > both `TestStressCM.java` and the similar `TestStressIGVN.java` to prevent flaky behavior, and run both tests for ten > different seeds to improve coverage. Use the code motion trace produced by TraceOptoPipelining (excluding traces of stubs) to assert that two compilations with the same seed cause StressLCM and StressGCM to take the same randomized decisions. Previously, the entire output produced by PrintOptoStatistics was used instead, which has shown to be too fragile. Also, disable inlining in both TestStressCM.java and the similar TestStressIGVN.java to prevent flaky behavior, and run both tests for ten different seeds to improve coverage. ------------- PR: https://git.openjdk.java.net/jdk/pull/693 From neliasso at openjdk.java.net Fri Oct 16 15:02:17 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 16 Oct 2020 15:02:17 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v3] In-Reply-To: <94qadtiTzSkdsJAc_8IWrLxpBvmfiBXMf_W9Z965P80=.9a59a5db-2209-4007-94bb-16ccd8ff0b77@github.com> References: <94qadtiTzSkdsJAc_8IWrLxpBvmfiBXMf_W9Z965P80=.9a59a5db-2209-4007-94bb-16ccd8ff0b77@github.com> Message-ID: On Tue, 13 Oct 2020 18:03:27 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 >> bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized >> instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag >> ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. 4) Based on the >> perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types >> (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : >> [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : >> [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Replacing explicit type checks with existing type checking routines Changes requested by neliasso (Reviewer). src/hotspot/cpu/x86/x86.ad line 5473: > 5471: BasicType elmType = this->bottom_type()->is_vect()->element_basic_type(); > 5472: int vector_len = vector_length_encoding(this); > 5473: //TODO: KRegister to be made valid "bound" operand to promote sharing. Remove todo - create a RFE instead. src/hotspot/cpu/x86/x86.ad line 5506: > 5504: BasicType elmType = src_node->bottom_type()->is_vect()->element_basic_type(); > 5505: int vector_len = vector_length_encoding(src_node); > 5506: //TODO: KRegister to be made valid "bound" operand to promote sharing. Remove todo - create a RFE instead. src/hotspot/share/adlc/forms.cpp line 271: > 269: if( strcmp(opType,"LoadS")==0 ) return Form::idealS; > 270: if( strcmp(opType,"LoadVector")==0 ) return Form::idealV; > 271: if( strcmp(opType,"VectorMaskedLoad")==0 ) return Form::idealV; More of a bike shedding question: The patterns is LoadRange, LoadS, LoadVector - why not name it in the same style - LoadVectorMasked? src/hotspot/share/adlc/forms.cpp line 288: > 286: if( strcmp(opType,"StoreNKlass")==0) return Form::idealNKlass; > 287: if( strcmp(opType,"StoreVector")==0 ) return Form::idealV; > 288: if( strcmp(opType,"VectorMaskedStore")==0 ) return Form::idealV; Same comment but for store - what do you think about naming it StoreVectorMasked? src/hotspot/share/opto/cfgnode.cpp line 423: > 421: // If a two input non-loop region has dead input > 422: // edge[s] degenerate any phi node contained within it. > 423: bool RegionNode::try_phi_disintegration(PhaseGVN *phase) { RegionNode::try_phi_disintegration - is it a requirement for this enhancement? or a separate issue? Also - I know we already remove phis that only have one input. If the input is set to top - PhiNode::Ideal should reduce the phi. If you have found a case where this doesn't happen - we should investigate and fix. src/hotspot/share/opto/cfgnode.cpp line 396: > 394: } > 395: > 396: bool RegionNode::is_self_loop(Node* n) { A bit expensive to DFS the entire graph to find a self loop. You don't need to visit nodes outside the loop. But you might not need to do this at all - see my comments further down. src/hotspot/share/opto/cfgnode.cpp line 436: > 434: Node* rep_node = NULL; > 435: PhaseIterGVN *igvn = phase->is_IterGVN(); > 436: if (in(1)->is_top() && !in(2)->is_top()) { The Phi-nodes for loops are always normalized - in(1) will be loop-entry and in(2) is the backedge. So if in(1) is top - in(2) will be a self loop. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From dnsimon at openjdk.java.net Fri Oct 16 16:56:21 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 16 Oct 2020 16:56:21 GMT Subject: RFR: 8254842: [JVMCI] copy thread name when attaching libgraal thread to HotSpot [v2] In-Reply-To: References: Message-ID: > This PR modifies `HotSpotJVMCIRuntime.attachCurrentThread` when it is called from within libgraal so that the name of > the thread in the libgraal heap is used as the name of the peer thread created in HotSpot. > This useful when viewing output such as `-XX:JVMCITraceLevel=1`. For example, here's sample output without this PR: > JVMCITrace-1[Thread-0]: initializing JVMCI runtime -1 > JVMCITrace-1[Thread-0]: initialized JVMCI runtime -1 > and then with: > JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initializing JVMCI runtime -1 > JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initialized JVMCI runtime -1 Doug Simon has updated the pull request incrementally with one additional commit since the last revision: add guarantee to clarify name argument must not be null ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/684/files - new: https://git.openjdk.java.net/jdk/pull/684/files/eb531fad..726faca5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=684&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=684&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/684.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/684/head:pull/684 PR: https://git.openjdk.java.net/jdk/pull/684 From jbhateja at openjdk.java.net Fri Oct 16 17:24:13 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 16 Oct 2020 17:24:13 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v3] In-Reply-To: <9lPNMo1V33tQD6qp-1l78dII5Hfle8Ea5VWwuY1l_qA=.2e420c11-6e70-41f8-80b4-5992dcdd02eb@github.com> References: <94qadtiTzSkdsJAc_8IWrLxpBvmfiBXMf_W9Z965P80=.9a59a5db-2209-4007-94bb-16ccd8ff0b77@github.com> <9lPNMo1V33tQD6qp-1l78dII5Hfle8Ea5VWwuY1l_qA=.2e420c11-6e70-41f8-80b4-5992dcdd02eb@github.com> Message-ID: On Thu, 15 Oct 2020 14:54:26 GMT, Nils Eliasson wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Replacing explicit type checks with existing type checking routines > > src/hotspot/share/opto/memnode.hpp line 1188: > >> 1186: TrailingLoadStore, >> 1187: LeadingLoadStore, >> 1188: AfterPartialArrayCopy > > Change to keep consistent with the other names: > AfterPartialArrayCopy -> TrailingPartialArrayCopy > > Why is a special kind needed for partial array copy? Idea here is to prevent bypassing arraycopy operation post expansion during memory chain discovery [and] optimization. Currently a memory barrier is inserted after array copy macro expansion into a stub call, this pattern is being checked during memory chain discovery, with partial in-lining we create additional control structure for selection b/w slow path (stub call) and fast path (partially in-lined code). To prevent increasing the complexity of patter matching introduced a flag in MemBarrier node which is set only if partial in-lining takes place. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From kvn at openjdk.java.net Fri Oct 16 17:54:11 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 16 Oct 2020 17:54:11 GMT Subject: RFR: 8254842: [JVMCI] copy thread name when attaching libgraal thread to HotSpot [v2] In-Reply-To: References: Message-ID: On Fri, 16 Oct 2020 16:56:21 GMT, Doug Simon wrote: >> This PR modifies `HotSpotJVMCIRuntime.attachCurrentThread` when it is called from within libgraal so that the name of >> the thread in the libgraal heap is used as the name of the peer thread created in HotSpot. >> This useful when viewing output such as `-XX:JVMCITraceLevel=1`. For example, here's sample output without this PR: >> JVMCITrace-1[Thread-0]: initializing JVMCI runtime -1 >> JVMCITrace-1[Thread-0]: initialized JVMCI runtime -1 >> and then with: >> JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initializing JVMCI runtime -1 >> JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initialized JVMCI runtime -1 > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > add guarantee to clarify name argument must not be null src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2354: > 2352: C2V_VMENTRY_PREFIX(jboolean, attachCurrentThread, (JNIEnv* env, jobject c2vm, jbyteArray name, jboolean > as_daemon)) 2353: if (thread == NULL) { > 2354: // Called from unattached JVMCI shared library thread What "unattached" means? ------------- PR: https://git.openjdk.java.net/jdk/pull/684 From jbhateja at openjdk.java.net Fri Oct 16 18:09:11 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 16 Oct 2020 18:09:11 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v3] In-Reply-To: References: <94qadtiTzSkdsJAc_8IWrLxpBvmfiBXMf_W9Z965P80=.9a59a5db-2209-4007-94bb-16ccd8ff0b77@github.com> Message-ID: On Fri, 16 Oct 2020 14:35:11 GMT, Nils Eliasson wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Replacing explicit type checks with existing type checking routines > > src/hotspot/share/opto/cfgnode.cpp line 423: > >> 421: // If a two input non-loop region has dead input >> 422: // edge[s] degenerate any phi node contained within it. >> 423: bool RegionNode::try_phi_disintegration(PhaseGVN *phase) { > > RegionNode::try_phi_disintegration - is it a requirement for this enhancement? or a separate issue? > > Also - I know we already remove phis that only have one input. If the input is set to top - PhiNode::Ideal should > reduce the phi. If you have found a case where this doesn't happen - we should investigate and fix. This transformation is being done during RegionNode idealization, A phi may be intact (have both valid inputs), but if its parent region has one control edge connected to top() in that case the phi-input corresponding to top() edge is being removed and phi is disintegrated. Currently for dead loops all its phi nodes are replaced by top() https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L575 For partial in-lining I introduced a mem phi node at the convergence of fast and slow path during node expansion which was getting replaced by top() during RegionNode idealization in case of a dead loop. This mem_phi had a user outside loop. while ( ) { // dead loop detection if ( len < 32 ) fast_path else slow_path mem_Phi = Memory(fast_path, slow_path) } memory = MemMerge(mem_Phi); ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From dnsimon at openjdk.java.net Fri Oct 16 18:16:17 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 16 Oct 2020 18:16:17 GMT Subject: RFR: 8254842: [JVMCI] copy thread name when attaching libgraal thread to HotSpot [v2] In-Reply-To: References: Message-ID: On Fri, 16 Oct 2020 17:51:33 GMT, Vladimir Kozlov wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> add guarantee to clarify name argument must not be null > > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2354: > >> 2352: C2V_VMENTRY_PREFIX(jboolean, attachCurrentThread, (JNIEnv* env, jobject c2vm, jbyteArray name, jboolean >> as_daemon)) 2353: if (thread == NULL) { >> 2354: // Called from unattached JVMCI shared library thread > > What "unattached" means? This is a thread that was created and started in libgraal. Libgraal is a JNI library and so any thread it creates that wants to call into the VM must first [attach](https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/invocation.html) itself to the VM. ------------- PR: https://git.openjdk.java.net/jdk/pull/684 From kvn at openjdk.java.net Fri Oct 16 19:14:09 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 16 Oct 2020 19:14:09 GMT Subject: RFR: 8254842: [JVMCI] copy thread name when attaching libgraal thread to HotSpot [v2] In-Reply-To: References: Message-ID: On Fri, 16 Oct 2020 16:56:21 GMT, Doug Simon wrote: >> This PR modifies `HotSpotJVMCIRuntime.attachCurrentThread` when it is called from within libgraal so that the name of >> the thread in the libgraal heap is used as the name of the peer thread created in HotSpot. >> This useful when viewing output such as `-XX:JVMCITraceLevel=1`. For example, here's sample output without this PR: >> JVMCITrace-1[Thread-0]: initializing JVMCI runtime -1 >> JVMCITrace-1[Thread-0]: initialized JVMCI runtime -1 >> and then with: >> JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initializing JVMCI runtime -1 >> JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initialized JVMCI runtime -1 > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > add guarantee to clarify name argument must not be null Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/684 From kvn at openjdk.java.net Fri Oct 16 19:14:11 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 16 Oct 2020 19:14:11 GMT Subject: RFR: 8254842: [JVMCI] copy thread name when attaching libgraal thread to HotSpot [v2] In-Reply-To: References: Message-ID: On Fri, 16 Oct 2020 18:13:33 GMT, Doug Simon wrote: >> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2354: >> >>> 2352: C2V_VMENTRY_PREFIX(jboolean, attachCurrentThread, (JNIEnv* env, jobject c2vm, jbyteArray name, jboolean >>> as_daemon)) 2353: if (thread == NULL) { >>> 2354: // Called from unattached JVMCI shared library thread >> >> What "unattached" means? > > This is a thread that was created and started in libgraal. Libgraal is a JNI library and so any thread it creates that > wants to call into the VM must first > [attach](https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/invocation.html) itself to the VM. okay. ------------- PR: https://git.openjdk.java.net/jdk/pull/684 From kvn at openjdk.java.net Fri Oct 16 19:20:12 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 16 Oct 2020 19:20:12 GMT Subject: RFR: 8254805: compiler/debug/TestStressCM.java is still failing In-Reply-To: References: Message-ID: On Fri, 16 Oct 2020 06:57:33 GMT, Roberto Casta?eda Lozano wrote: > Use the code motion trace produced by `TraceOptoPipelining` (excluding traces of stubs) to assert that two compilations > with the same seed cause `StressLCM` and `StressGCM` to take the same randomized decisions. Previously, the entire > output produced by `PrintOptoStatistics` was used instead, which has shown to be too fragile. Also, disable inlining in > both `TestStressCM.java` and the similar `TestStressIGVN.java` to prevent flaky behavior, and run both tests for ten > different seeds to improve coverage. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/693 From github.com+8022952+nhat-nguyen at openjdk.java.net Fri Oct 16 23:09:15 2020 From: github.com+8022952+nhat-nguyen at openjdk.java.net (Nhat Nguyen) Date: Fri, 16 Oct 2020 23:09:15 GMT Subject: RFR: 8251271: C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes Message-ID: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> I'm following up on this thread http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039910.html I'm clearing `for_igvn` before restoring as suggested by Xin ------------- Commit messages: - 8251271: C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes Changes: https://git.openjdk.java.net/jdk/pull/713/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=713&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8251271 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/713.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/713/head:pull/713 PR: https://git.openjdk.java.net/jdk/pull/713 From xliu at openjdk.java.net Sat Oct 17 04:00:09 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 17 Oct 2020 04:00:09 GMT Subject: RFR: 8251271: C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes In-Reply-To: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> References: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> Message-ID: On Fri, 16 Oct 2020 23:02:46 GMT, Nhat Nguyen wrote: > I'm following up on this thread http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039910.html > I'm clearing `for_igvn` before restoring as suggested by Xin Looks good to me, but we need at least a reviewer to approve it. ------------- PR: https://git.openjdk.java.net/jdk/pull/713 From redestad at openjdk.java.net Sat Oct 17 11:36:14 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sat, 17 Oct 2020 11:36:14 GMT Subject: RFR: 8254955: x86: MethodHandlesAdapterBlob is too big Message-ID: At some point JSR 292 was reworked to generate all but a small handful of interpreter stubs lazily, leaving the MethodHandlesAdapterBlob with a bit too much room to spare. The remaining stubs use less than 1000 bytes of memory in product builds, and less than 3k in debug builds. This patch adjust the sizes accordingly. Other platforms (except zero) seem like they could use a similar adjustment, but I don't have hardware available to check how big the interpreter stubs get on anything but x86 so I'll leave them untouched unless someone can run the numbers (`java -XX:+UnlockDiagnosticVMOptions -XX:+VerifyMethodHandles -XX:+LogCompilation` - grep the hotspot_log generated for MethodHandlesAdapterBlob or just blob since it's the first one) ------------- Commit messages: - Merge branch 'master' into mh_adapters_sizing - methodHandlesAdapterBlob is excessively sized Changes: https://git.openjdk.java.net/jdk/pull/717/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=717&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254955 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/717.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/717/head:pull/717 PR: https://git.openjdk.java.net/jdk/pull/717 From vlivanov at openjdk.java.net Sat Oct 17 13:57:09 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Sat, 17 Oct 2020 13:57:09 GMT Subject: RFR: 8251271: C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes In-Reply-To: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> References: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> Message-ID: On Fri, 16 Oct 2020 23:02:46 GMT, Nhat Nguyen wrote: > I'm following up on this thread http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039910.html > I'm clearing `for_igvn` before restoring as suggested by Xin src/hotspot/share/opto/compile.cpp line 2115: > 2113: // Clear the for_igvn list because it may have irrelevant nodes > 2114: // from the previous PhaseRenumberLive run. > 2115: save_for_igvn->clear(); Since `PhaseRenumberLive::PhaseRenumberLive` moves nodes from `for_igvn()` to `new_worklist`, does it make more sense to drain original `for_igvn()` worklist there? ------------- PR: https://git.openjdk.java.net/jdk/pull/713 From xliu at openjdk.java.net Sat Oct 17 21:41:09 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 17 Oct 2020 21:41:09 GMT Subject: RFR: 8251271: C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes In-Reply-To: References: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> Message-ID: On Sat, 17 Oct 2020 13:54:20 GMT, Vladimir Ivanov wrote: >> I'm following up on this thread http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039910.html >> I'm clearing `for_igvn` before restoring as suggested by Xin > > src/hotspot/share/opto/compile.cpp line 2115: > >> 2113: // Clear the for_igvn list because it may have irrelevant nodes >> 2114: // from the previous PhaseRenumberLive run. >> 2115: save_for_igvn->clear(); > > Since `PhaseRenumberLive::PhaseRenumberLive` moves nodes from `for_igvn()` to `new_worklist`, does it make more sense > to drain original `for_igvn()` worklist there? +1 ------------- PR: https://git.openjdk.java.net/jdk/pull/713 From redestad at openjdk.java.net Sat Oct 17 22:58:21 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sat, 17 Oct 2020 22:58:21 GMT Subject: RFR: 8254955: x86: MethodHandlesAdapterBlob is too big [v2] In-Reply-To: References: Message-ID: > At some point JSR 292 was reworked to generate all but a small handful of interpreter stubs lazily, leaving the > MethodHandlesAdapterBlob with a bit too much room to spare. > The remaining stubs use less than 1000 bytes of memory in product builds, and less than 3k in debug builds. This patch > adjust the sizes accordingly. > Other platforms (except zero) seem like they could use a similar adjustment, but I don't have hardware available to > check how big the interpreter stubs get on anything but x86 so I'll leave them untouched unless someone can run the > numbers (`java -XX:+UnlockDiagnosticVMOptions -XX:+VerifyMethodHandles -XX:+LogCompilation` - grep the hotspot_log > generated for MethodHandlesAdapterBlob or just blob since it's the first one) Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: ZGC & Shenandoah requires a little more space ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/717/files - new: https://git.openjdk.java.net/jdk/pull/717/files/0a219c3d..d21c108d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=717&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=717&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/717.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/717/head:pull/717 PR: https://git.openjdk.java.net/jdk/pull/717 From redestad at openjdk.java.net Sun Oct 18 09:31:15 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sun, 18 Oct 2020 09:31:15 GMT Subject: RFR: 8254965: Remove unused Matcher::_ruleName and make ruleName non-product Message-ID: - Matcher::_ruleName is unused - remove - ruleName, defined via ad, is only used by a non-product routine. #ifdef it accordingly This reduces the JVM size (-63Kb on linux-x64, or about 0.25%) ------------- Commit messages: - Matcher::_ruleName unused and ruleName only used by non-product code Changes: https://git.openjdk.java.net/jdk/pull/722/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=722&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254965 Stats: 5 lines in 3 files changed: 2 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/722.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/722/head:pull/722 PR: https://git.openjdk.java.net/jdk/pull/722 From redestad at openjdk.java.net Sun Oct 18 13:34:17 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sun, 18 Oct 2020 13:34:17 GMT Subject: RFR: 8254966: Remove unused code from Matcher Message-ID: These are unused: interpreter_method_reg* compiler_method_reg* pd_implicit_null_fixup (only existed to workaround some Win95 issues) Quite some code in the .ad files shook loose when removing these, so I will need some help verifying that the s390 and ppc changes are fine. ------------- Commit messages: - Remove interpreter_method_reg from AD files, adjust parser - Clean out compiler_method_reg left-overs - Remove unused methods in matcher and corresponding code in ad files Changes: https://git.openjdk.java.net/jdk/pull/723/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=723&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254966 Stats: 273 lines in 11 files changed: 1 ins; 259 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/723.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/723/head:pull/723 PR: https://git.openjdk.java.net/jdk/pull/723 From redestad at openjdk.java.net Sun Oct 18 17:03:13 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sun, 18 Oct 2020 17:03:13 GMT Subject: RFR: 8253453: SourceFileInfoTable should be allocated lazily Message-ID: Avoids allocating and clearing a 128Kb large ResourceHashTable eagerly on startup It's only used when PrintInterpreter is enabled - which prints out the interpreter during bootstrap - so proper synchronization seems excessive. Testing: tier1, verified source line comments retained with `-XX:+PrintInterpreter` ------------- Commit messages: - Allocate SourceFileInfoTable into C_HEAP - Lazily allocate SourceFileInfoTable in disassembler Changes: https://git.openjdk.java.net/jdk/pull/724/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=724&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253453 Stats: 12 lines in 1 file changed: 7 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/724.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/724/head:pull/724 PR: https://git.openjdk.java.net/jdk/pull/724 From jbhateja at openjdk.java.net Sun Oct 18 18:39:18 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 18 Oct 2020 18:39:18 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v4] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 > bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized > instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag > ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. 4) Based on the > perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types > (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - Replacing explicit type checks with existing type checking routines - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions. ------------- Changes: https://git.openjdk.java.net/jdk/pull/302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=03 Stats: 518 lines in 23 files changed: 494 ins; 0 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From neliasso at openjdk.java.net Sun Oct 18 18:59:12 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Sun, 18 Oct 2020 18:59:12 GMT Subject: RFR: 8253453: SourceFileInfoTable should be allocated lazily In-Reply-To: References: Message-ID: On Sun, 18 Oct 2020 16:50:31 GMT, Claes Redestad wrote: > Avoids allocating and clearing a 128Kb large ResourceHashTable eagerly on startup > > It's only used when PrintInterpreter is enabled - which prints out the interpreter during bootstrap - so proper > synchronization seems excessive. > Testing: tier1, verified source line comments retained with `-XX:+PrintInterpreter` Looks good to me! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/724 From neliasso at openjdk.java.net Sun Oct 18 19:01:09 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Sun, 18 Oct 2020 19:01:09 GMT Subject: RFR: 8254966: Remove unused code from Matcher In-Reply-To: References: Message-ID: On Sun, 18 Oct 2020 12:11:41 GMT, Claes Redestad wrote: > These are unused: > > interpreter_method_reg* > compiler_method_reg* > pd_implicit_null_fixup (only existed to workaround some Win95 issues) > > Quite some code in the .ad files shook loose when removing these, so I will need some help verifying that the s390 and > ppc changes are fine. Nice clean up! Reviewed. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/723 From neliasso at openjdk.java.net Sun Oct 18 19:03:10 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Sun, 18 Oct 2020 19:03:10 GMT Subject: RFR: 8254965: Remove unused Matcher::_ruleName and make ruleName non-product In-Reply-To: References: Message-ID: On Sun, 18 Oct 2020 09:22:45 GMT, Claes Redestad wrote: > - Matcher::_ruleName is unused - remove > - ruleName, defined via ad, is only used by a non-product routine. #ifdef it accordingly > > This reduces the JVM size (-63Kb on linux-x64, or about 0.25%) Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/722 From neliasso at openjdk.java.net Sun Oct 18 19:04:14 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Sun, 18 Oct 2020 19:04:14 GMT Subject: RFR: 8254955: x86: MethodHandlesAdapterBlob is too big [v2] In-Reply-To: References: Message-ID: On Sat, 17 Oct 2020 22:58:21 GMT, Claes Redestad wrote: >> At some point JSR 292 was reworked to generate all but a small handful of interpreter stubs lazily, leaving the >> MethodHandlesAdapterBlob with a bit too much room to spare. >> The remaining stubs use less than 1000 bytes of memory in product builds, and less than 3k in debug builds. This patch >> adjust the sizes accordingly. >> Other platforms (except zero) seem like they could use a similar adjustment, but I don't have hardware available to >> check how big the interpreter stubs get on anything but x86 so I'll leave them untouched unless someone can run the >> numbers (`java -XX:+UnlockDiagnosticVMOptions -XX:+VerifyMethodHandles -XX:+LogCompilation` - grep the hotspot_log >> generated for MethodHandlesAdapterBlob or just blob since it's the first one) > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > ZGC & Shenandoah requires a little more space Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/717 From kvn at openjdk.java.net Mon Oct 19 03:07:08 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 19 Oct 2020 03:07:08 GMT Subject: RFR: 8254965: Remove unused Matcher::_ruleName and make ruleName non-product In-Reply-To: References: Message-ID: On Sun, 18 Oct 2020 09:22:45 GMT, Claes Redestad wrote: > - Matcher::_ruleName is unused - remove > - ruleName, defined via ad, is only used by a non-product routine. #ifdef it accordingly > > This reduces the JVM size (-63Kb on linux-x64, or about 0.25%) Unfortunately we may soon use it (problem was found only in product VM during Graal testing). Following JDK-8247350 fix to catch issues in product VM we may want to use rule names: tty->print_cr("fatal error context information: rule: (%s) n_size (%d), current_offset (%d), instr_offset (%d)", mach->rule() == 9999999 ? "" : ruleName[mach->rule()], n_size, current_offset, instr_offset); ------------- PR: https://git.openjdk.java.net/jdk/pull/722 From ecaspole at openjdk.java.net Mon Oct 19 07:04:18 2020 From: ecaspole at openjdk.java.net (Eric Caspole) Date: Mon, 19 Oct 2020 07:04:18 GMT Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 Message-ID: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> We have seen some specific benefits to increasing InlineSmallCode to 2500 from 2000, and across the whole promo build perf test collection the change is neutral to slightly positive, where the tests are run on modern OCI systems. Passed tier1 testing, some ad-hoc perf testing and more compiler related parts of the weekly promo performance set. JBS: https://bugs.openjdk.java.net/browse/JDK-8254913 Thanks, Eric ------------- Commit messages: - 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 Changes: https://git.openjdk.java.net/jdk/pull/705/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=705&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254913 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/705.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/705/head:pull/705 PR: https://git.openjdk.java.net/jdk/pull/705 From redestad at openjdk.java.net Mon Oct 19 07:04:19 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 19 Oct 2020 07:04:19 GMT Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: On Fri, 16 Oct 2020 17:31:58 GMT, Eric Caspole wrote: > We have seen some specific benefits to increasing InlineSmallCode to 2500 from 2000, and across the whole promo build > perf test collection the change is neutral to slightly positive, where the tests are run on modern OCI systems. > Passed tier1 testing, some ad-hoc perf testing and more compiler related parts of the weekly promo performance set. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8254913 > Thanks, > Eric LGTM! ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/705 From shade at openjdk.java.net Mon Oct 19 07:04:19 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 19 Oct 2020 07:04:19 GMT Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: On Fri, 16 Oct 2020 17:31:58 GMT, Eric Caspole wrote: > We have seen some specific benefits to increasing InlineSmallCode to 2500 from 2000, and across the whole promo build > perf test collection the change is neutral to slightly positive, where the tests are run on modern OCI systems. > Passed tier1 testing, some ad-hoc perf testing and more compiler related parts of the weekly promo performance set. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8254913 > Thanks, > Eric Changing the global inlining defaults without providing the justifying performance data is not acceptable. @cl4es, please rescind your review if you can, so that change is not integrated by accident. ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/705 From azeemj at openjdk.java.net Mon Oct 19 07:04:20 2020 From: azeemj at openjdk.java.net (Azeem Jiva) Date: Mon, 19 Oct 2020 07:04:20 GMT Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: On Fri, 16 Oct 2020 17:33:43 GMT, Claes Redestad wrote: >> We have seen some specific benefits to increasing InlineSmallCode to 2500 from 2000, and across the whole promo build >> perf test collection the change is neutral to slightly positive, where the tests are run on modern OCI systems. >> Passed tier1 testing, some ad-hoc perf testing and more compiler related parts of the weekly promo performance set. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8254913 >> Thanks, >> Eric > > LGTM! Do you have any results? Changing defaults without published results makes the process opaque. ------------- PR: https://git.openjdk.java.net/jdk/pull/705 From avoitylov at openjdk.java.net Mon Oct 19 07:04:20 2020 From: avoitylov at openjdk.java.net (Aleksei Voitylov) Date: Mon, 19 Oct 2020 07:04:20 GMT Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: On Fri, 16 Oct 2020 20:47:45 GMT, Azeem Jiva wrote: >> LGTM! > > Do you have any results? Changing defaults without published results makes the process opaque. It would be interesting to understand the impact on code cache size as well and see a definition of OCI system so that the other members of the community could check some other HW. ------------- PR: https://git.openjdk.java.net/jdk/pull/705 From rcastanedalo at openjdk.java.net Mon Oct 19 07:45:13 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Oct 2020 07:45:13 GMT Subject: RFR: 8254805: compiler/debug/TestStressCM.java is still failing In-Reply-To: References: Message-ID: On Fri, 16 Oct 2020 19:16:36 GMT, Vladimir Kozlov wrote: > Good. Thanks Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/693 From njian at openjdk.java.net Mon Oct 19 08:41:21 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 19 Oct 2020 08:41:21 GMT Subject: RFR: 8254884: Make sure jvm does not crash with Arm SVE and Vector API Message-ID: Currently we have not implemented all Arm SVE code generation for Vector API specific nodes. To make sure hotspot does not crash with bad AD file (as NEON has implemented them), we simply add those OPs to unsupported op list. This is the port and minor cleanup of JDK-8253211 in repo-panama: https://github.com/openjdk/panama-vector/pull/7 with Op_VectorUnbox (not for codegen) and Op_VectorMaskWrapper (actually unused node. dead code?) removed from the unsupported op list and Op_VectorLoadConst added. Test: tier1-3 on AArch64 and x86_64 as well as Vector API tests on AArch64 SVE. ------------- Commit messages: - 8254884: Make sure jvm does not crash with Arm SVE and Vector API Changes: https://git.openjdk.java.net/jdk/pull/726/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=726&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254884 Stats: 130 lines in 4 files changed: 117 ins; 0 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/726.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/726/head:pull/726 PR: https://git.openjdk.java.net/jdk/pull/726 From vlivanov at openjdk.java.net Mon Oct 19 09:04:10 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 19 Oct 2020 09:04:10 GMT Subject: RFR: 8254884: Make sure jvm does not crash with Arm SVE and Vector API In-Reply-To: References: Message-ID: <0Ch1KNiVJC7a09Nqq7_0c6t-iYUZLCW4BkJ3TMKdfiQ=.61d883c7-b05a-445d-bb25-cc99f5da2045@github.com> On Mon, 19 Oct 2020 08:31:38 GMT, Ningsheng Jian wrote: > Currently we have not implemented all Arm SVE code generation for Vector API specific nodes. To make sure hotspot does > not crash with bad AD file (as NEON has implemented them), we simply add those OPs to unsupported op list. > This is the port and minor cleanup of JDK-8253211 in repo-panama: https://github.com/openjdk/panama-vector/pull/7 with > Op_VectorUnbox (not for codegen) and Op_VectorMaskWrapper (actually unused node. dead code?) removed from the > unsupported op list and Op_VectorLoadConst added. Test: tier1-3 on AArch64 and x86_64 as well as Vector API tests on > AArch64 SVE. Changes requested by vlivanov (Reviewer). src/hotspot/share/opto/vectorIntrinsics.cpp line 601: > 599: int byte_num_elem = num_elem * type2aelembytes(elem_bt); > 600: if (!arch_supports_vector(is_store ? Op_StoreVector : Op_LoadVector, byte_num_elem, T_BYTE, VecMaskNotUsed) > 601: || !arch_supports_vector(Op_VectorReinterpret, num_elem, T_BYTE, VecMaskNotUsed)) { s/num_elem/byte_num_elem/ ? ------------- PR: https://git.openjdk.java.net/jdk/pull/726 From rcastanedalo at openjdk.java.net Mon Oct 19 09:21:21 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Oct 2020 09:21:21 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially Message-ID: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> In the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) within `ConvI2LNode::Ideal()`, handle the special case x = y by feeding both inputs of AddL from a single ConvI2L node rather than creating two semantically equivalent ConvI2L nodes. This avoids an exponential number of calls to `ConvI2LNode::Ideal()` when dealing with long chains of AddI nodes. Disable the optimization for the pattern ConvI2L(SubI(x, x)), which is simplified to zero during parsing anyway. Add a set of regression tests for the transformation that cover different shapes of AddI subgraphs. Also add a microbenchmark that exercises the special case, for performance regression testing. ------------- Commit messages: - Merge master - 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially Changes: https://git.openjdk.java.net/jdk/pull/727/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=727&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254317 Stats: 248 lines in 5 files changed: 243 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/727.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/727/head:pull/727 PR: https://git.openjdk.java.net/jdk/pull/727 From rcastanedalo at openjdk.java.net Mon Oct 19 09:21:21 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Oct 2020 09:21:21 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially In-Reply-To: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Mon, 19 Oct 2020 08:36:53 GMT, Roberto Casta?eda Lozano wrote: > In the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) within `ConvI2LNode::Ideal()`, handle the > special case x = y by feeding both inputs of AddL from a single ConvI2L node rather than creating two semantically > equivalent ConvI2L nodes. This avoids an exponential number of calls to `ConvI2LNode::Ideal()` when dealing with long > chains of AddI nodes. Disable the optimization for the pattern ConvI2L(SubI(x, x)), which is simplified to zero during > parsing anyway. Add a set of regression tests for the transformation that cover different shapes of AddI subgraphs. > Also add a microbenchmark that exercises the special case, for performance regression testing. Tested on tier1-3 on windows-x64, linux-x64, linux-aarch64, and macosx-x64 in both release and debug mode. ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From rcastanedalo at openjdk.java.net Mon Oct 19 09:21:21 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Oct 2020 09:21:21 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Mon, 19 Oct 2020 08:45:28 GMT, Roberto Casta?eda Lozano wrote: >> In the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) within `ConvI2LNode::Ideal()`, handle the >> special case x = y by feeding both inputs of AddL from a single ConvI2L node rather than creating two semantically >> equivalent ConvI2L nodes. This avoids an exponential number of calls to `ConvI2LNode::Ideal()` when dealing with long >> chains of AddI nodes. Disable the optimization for the pattern ConvI2L(SubI(x, x)), which is simplified to zero during >> parsing anyway. Add a set of regression tests for the transformation that cover different shapes of AddI subgraphs. >> Also add a microbenchmark that exercises the special case, for performance regression testing. > > Tested on tier1-3 on windows-x64, linux-x64, linux-aarch64, and macosx-x64 in both release and debug mode. The microbenchmark gives the following results before and after the proposed fix (linux-x86_64-server-release build run on Intel Core i7-9850H CPU @ 2.60GHz with 32 GB main memory): | Version | Mode | Cnt | Score | Error | Units | |---|---|---|---|---|---| | before the fix | ss | 10 | 15402.110 | 171.255 | ms/op | | after the fix | ss | 10 | 0.976 | 0.069 | ms/op | "Single-shot" (ss) benchmarking mode is used here because the focus is on measuring C2 execution time. ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From rcastanedalo at openjdk.java.net Mon Oct 19 09:21:22 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Oct 2020 09:21:22 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Mon, 19 Oct 2020 09:01:48 GMT, Roberto Casta?eda Lozano wrote: >> Tested on tier1-3 on windows-x64, linux-x64, linux-aarch64, and macosx-x64 in both release and debug mode. > > The microbenchmark gives the following results before and after the proposed fix (linux-x86_64-server-release build run > on Intel Core i7-9850H CPU @ 2.60GHz with 32 GB main memory): | Version | Mode | Cnt | Score | Error | Units | > |---|---|---|---|---|---| > | before the fix | ss | 10 | 15402.110 | 171.255 | ms/op | > | after the fix | ss | 10 | 0.976 | 0.069 | ms/op | > > "Single-shot" (ss) benchmarking mode is used here because the focus is on measuring C2 execution time. In the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) within ConvI2LNode::Ideal(), handle the special case x = y by feeding both inputs of AddL from a single ConvI2L node rather than creating two semantically equivalent ConvI2L nodes. This avoids an exponential number of calls to ConvI2LNode::Ideal() when dealing with long chains of AddI nodes. Disable the optimization for the pattern ConvI2L(SubI(x, x)), which is simplified to zero during parsing anyway. Add a set of regression tests for the transformation that cover different shapes of AddI subgraphs. Also add a microbenchmark that exercises the special case, for performance regression testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From redestad at openjdk.java.net Mon Oct 19 09:32:11 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 19 Oct 2020 09:32:11 GMT Subject: RFR: 8254965: Remove unused Matcher::_ruleName and make ruleName non-product In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 03:04:16 GMT, Vladimir Kozlov wrote: >> - Matcher::_ruleName is unused - remove >> - ruleName, defined via ad, is only used by a non-product routine. #ifdef it accordingly >> >> This reduces the JVM size (-63Kb on linux-x64, or about 0.25%) > > Unfortunately we may soon use it (problem was found only in product VM during Graal testing). Following JDK-8247350 fix > to catch issues in product VM we may want to use rule names: > tty->print_cr("fatal error context information: rule: (%s) n_size (%d), current_offset (%d), instr_offset (%d)", > mach->rule() == 9999999 ? "" : ruleName[mach->rule()], n_size, current_offset, instr_offset); Does that print really need to print the name of the rule - or would just printing `mach->rule()` suffice? For sustainability purposes I suppose having the name spelled out is a nice to have rather than a strict necessity. ------------- PR: https://git.openjdk.java.net/jdk/pull/722 From shade at openjdk.java.net Mon Oct 19 10:22:17 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 19 Oct 2020 10:22:17 GMT Subject: RFR: 8254994: [x86] C1 StubAssembler::call_RT, "call_offset might not be initialized" Message-ID: <8NE6tuVKcgl7wBq7dSmsKmeLWxdYr4oPAPnIfXW2Ml0=.050ad557-d640-495c-88cc-e35e2eae1513@github.com> Static analyzers complain `call_offset` might not be initialized in `StubAssembler::call_RT` in `c1_Runtime1_x86.cpp`. The way I see it, it depends on `align_stack` value, and it is set whether it `align_stack` is `true` or `false`. But we can probably make it cleaner so that future errors would be clearly detectable. Since it is initialized off `offset()`, it should not be negative. Testing: - [x] Linux x86_64 tier1 ------------- Commit messages: - 8254994: [x86] C1 StubAssembler::call_RT, "call_offset might not be initialized" Changes: https://git.openjdk.java.net/jdk/pull/730/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=730&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254994 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/730.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/730/head:pull/730 PR: https://git.openjdk.java.net/jdk/pull/730 From njian at openjdk.java.net Mon Oct 19 10:26:22 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 19 Oct 2020 10:26:22 GMT Subject: RFR: 8254884: Make sure jvm does not crash with Arm SVE and Vector API [v2] In-Reply-To: References: Message-ID: > Currently we have not implemented all Arm SVE code generation for Vector API specific nodes. To make sure hotspot does > not crash with bad AD file (as NEON has implemented them), we simply add those OPs to unsupported op list. > This is the port and minor cleanup of JDK-8253211 in repo-panama: https://github.com/openjdk/panama-vector/pull/7 with > Op_VectorUnbox (not for codegen) and Op_VectorMaskWrapper (actually unused node. dead code?) removed from the > unsupported op list and Op_VectorLoadConst added. Test: tier1-3 on AArch64 and x86_64 as well as Vector API tests on > AArch64 SVE. Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: Address review comments from Vladimir. Change-Id: Ia484dc59f41d265972f1253193d1d7d7cb032a12 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/726/files - new: https://git.openjdk.java.net/jdk/pull/726/files/1e51f419..9bfa188c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=726&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=726&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/726.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/726/head:pull/726 PR: https://git.openjdk.java.net/jdk/pull/726 From njian at openjdk.java.net Mon Oct 19 10:26:23 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 19 Oct 2020 10:26:23 GMT Subject: RFR: 8254884: Make sure jvm does not crash with Arm SVE and Vector API [v2] In-Reply-To: <0Ch1KNiVJC7a09Nqq7_0c6t-iYUZLCW4BkJ3TMKdfiQ=.61d883c7-b05a-445d-bb25-cc99f5da2045@github.com> References: <0Ch1KNiVJC7a09Nqq7_0c6t-iYUZLCW4BkJ3TMKdfiQ=.61d883c7-b05a-445d-bb25-cc99f5da2045@github.com> Message-ID: On Mon, 19 Oct 2020 09:01:23 GMT, Vladimir Ivanov wrote: >> Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments from Vladimir. >> >> Change-Id: Ia484dc59f41d265972f1253193d1d7d7cb032a12 > > src/hotspot/share/opto/vectorIntrinsics.cpp line 601: > >> 599: int byte_num_elem = num_elem * type2aelembytes(elem_bt); >> 600: if (!arch_supports_vector(is_store ? Op_StoreVector : Op_LoadVector, byte_num_elem, T_BYTE, VecMaskNotUsed) >> 601: || !arch_supports_vector(Op_VectorReinterpret, num_elem, T_BYTE, VecMaskNotUsed)) { > > s/num_elem/byte_num_elem/ ? Yes. Thanks! Fixed in the new commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/726 From chagedorn at openjdk.java.net Mon Oct 19 10:31:09 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 19 Oct 2020 10:31:09 GMT Subject: RFR: 8254994: [x86] C1 StubAssembler::call_RT, "call_offset might not be initialized" In-Reply-To: <8NE6tuVKcgl7wBq7dSmsKmeLWxdYr4oPAPnIfXW2Ml0=.050ad557-d640-495c-88cc-e35e2eae1513@github.com> References: <8NE6tuVKcgl7wBq7dSmsKmeLWxdYr4oPAPnIfXW2Ml0=.050ad557-d640-495c-88cc-e35e2eae1513@github.com> Message-ID: On Mon, 19 Oct 2020 10:14:03 GMT, Aleksey Shipilev wrote: > Static analyzers complain `call_offset` might not be initialized in `StubAssembler::call_RT` in `c1_Runtime1_x86.cpp`. > The way I see it, it depends on `align_stack` value, and it is set whether it `align_stack` is `true` or `false`. But > we can probably make it cleaner so that future errors would be clearly detectable. Since it is initialized off > `offset()`, it should not be negative. Testing: > - [x] Linux x86_64 tier1 Marked as reviewed by chagedorn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/730 From chagedorn at openjdk.java.net Mon Oct 19 10:37:10 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 19 Oct 2020 10:37:10 GMT Subject: RFR: 8253453: SourceFileInfoTable should be allocated lazily In-Reply-To: References: Message-ID: <6oO-53qsVoukp8Wuy9BjQO0zNs5KZJ9rwqR-ABIGeB0=.731e61b8-277a-460a-8a0d-553f10598a97@github.com> On Sun, 18 Oct 2020 16:50:31 GMT, Claes Redestad wrote: > Avoids allocating and clearing a 128Kb large ResourceHashTable eagerly on startup > > It's only used when PrintInterpreter is enabled - which prints out the interpreter during bootstrap - so proper > synchronization seems excessive. > Testing: tier1, verified source line comments retained with `-XX:+PrintInterpreter` Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/724 From vlivanov at openjdk.java.net Mon Oct 19 10:38:15 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 19 Oct 2020 10:38:15 GMT Subject: RFR: 8254884: Make sure jvm does not crash with Arm SVE and Vector API [v2] In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 10:26:22 GMT, Ningsheng Jian wrote: >> Currently we have not implemented all Arm SVE code generation for Vector API specific nodes. To make sure hotspot does >> not crash with bad AD file (as NEON has implemented them), we simply add those OPs to unsupported op list. >> This is the port and minor cleanup of JDK-8253211 in repo-panama: https://github.com/openjdk/panama-vector/pull/7 with >> Op_VectorUnbox (not for codegen) and Op_VectorMaskWrapper (actually unused node. dead code?) removed from the >> unsupported op list and Op_VectorLoadConst added. Test: tier1-3 on AArch64 and x86_64 as well as Vector API tests on >> AArch64 SVE. > > Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments from Vladimir. > > Change-Id: Ia484dc59f41d265972f1253193d1d7d7cb032a12 Shared code changes look good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/726 From redestad at openjdk.java.net Mon Oct 19 10:54:14 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 19 Oct 2020 10:54:14 GMT Subject: RFR: 8253453: SourceFileInfoTable should be allocated lazily In-Reply-To: <6oO-53qsVoukp8Wuy9BjQO0zNs5KZJ9rwqR-ABIGeB0=.731e61b8-277a-460a-8a0d-553f10598a97@github.com> References: <6oO-53qsVoukp8Wuy9BjQO0zNs5KZJ9rwqR-ABIGeB0=.731e61b8-277a-460a-8a0d-553f10598a97@github.com> Message-ID: On Mon, 19 Oct 2020 10:34:35 GMT, Christian Hagedorn wrote: >> Avoids allocating and clearing a 128Kb large ResourceHashTable eagerly on startup >> >> It's only used when PrintInterpreter is enabled - which prints out the interpreter during bootstrap - so proper >> synchronization seems excessive. >> Testing: tier1, verified source line comments retained with `-XX:+PrintInterpreter` > > Looks good. @neliasso @chhagedorn: Thank you both for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/724 From redestad at openjdk.java.net Mon Oct 19 10:54:15 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 19 Oct 2020 10:54:15 GMT Subject: Integrated: 8253453: SourceFileInfoTable should be allocated lazily In-Reply-To: References: Message-ID: On Sun, 18 Oct 2020 16:50:31 GMT, Claes Redestad wrote: > Avoids allocating and clearing a 128Kb large ResourceHashTable eagerly on startup > > It's only used when PrintInterpreter is enabled - which prints out the interpreter during bootstrap - so proper > synchronization seems excessive. > Testing: tier1, verified source line comments retained with `-XX:+PrintInterpreter` This pull request has now been integrated. Changeset: e9be2db7 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/e9be2db7 Stats: 12 lines in 1 file changed: 7 ins; 0 del; 5 mod 8253453: SourceFileInfoTable should be allocated lazily Reviewed-by: neliasso, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/724 From ihse at openjdk.java.net Mon Oct 19 11:06:14 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 19 Oct 2020 11:06:14 GMT Subject: RFR: 8254827: JVMCI: Enable it for Windows+AArch64 In-Reply-To: References: Message-ID: On Thu, 15 Oct 2020 15:00:47 GMT, Bernhard Urban-Forster wrote: > Use r18 as allocatable register on Linux only. > > A bootstrap works now (it has been crashing before due to r18 being allocated): > $ > ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version > Bootstrapping JVMCI................................. in 17990 ms (compiled 3330 methods) > openjdk version "16-internal" 2021-03-16 > OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) > > Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. Build changes look good, but you'll need a review on the hotspot part as well. ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/685 From fyang at openjdk.java.net Mon Oct 19 11:14:22 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Mon, 19 Oct 2020 11:14:22 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v9] In-Reply-To: References: Message-ID: > Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com > > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. > Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 > > Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. > For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. > "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. > > We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. > Patch passed jtreg tier1-3 tests with QEMU system emulator. > Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make > sure that there's no regression. > We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java > We measured the performance benefit with an aarch64 cycle-accurate simulator. > Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. > > For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on > real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is > detected. Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge master - Remove unnecessary code changes in vm_version_aarch64.cpp - Merge master - Merge master - Merge master - Merge master - Add sha3 instructions to cpu/aarch64/aarch64-asmtest.py and regenerate the test in assembler_aarch64.cpp:asm_check - Rebase - Merge master - Fix trailing whitespace issue - ... and 1 more: https://git.openjdk.java.net/jdk/compare/e9be2db7...05551701 ------------- Changes: https://git.openjdk.java.net/jdk/pull/207/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=207&range=08 Stats: 1262 lines in 36 files changed: 1007 ins; 22 del; 233 mod Patch: https://git.openjdk.java.net/jdk/pull/207.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/207/head:pull/207 PR: https://git.openjdk.java.net/jdk/pull/207 From roland at openjdk.java.net Mon Oct 19 11:33:13 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 19 Oct 2020 11:33:13 GMT Subject: Integrated: 8223051: support loops with long (64b) trip counts In-Reply-To: References: Message-ID: <__IkG_mxIWT_OUVUjF4n_WpcDK4ODNGgJGfqUa3B2Sw=.95dc911c-a21b-4818-98df-cb137f504354@github.com> On Wed, 23 Sep 2020 09:08:59 GMT, Roland Westrelin wrote: > Last webrev was: > > https://cr.openjdk.java.net/~roland/8223051/webrev.03/ > > This PR includes a few minor changes: > > - The change in callnode.cpp that Vladimir requested in: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039551.html > > - Extra comments that John requested in: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-August/039621.html > > - A couple extra counters to collect more detailed statistics > > 8252696 (Loop unswitching may cause out of bound array load to be > executed) was the only bug that was uncovered by extended testing and > it's fixed now. > > This was previously reviewed by Tobias, Vladimir and John. Given the > last changes were either requested by reviewers or a straighforward > improvement to statistics, and unless someone objects, I intend to > push this in the next few days with the reviewer list I just > mentioned. This pull request has now been integrated. Changeset: e76de189 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/e76de189 Stats: 902 lines in 11 files changed: 823 ins; 63 del; 16 mod 8223051: support loops with long (64b) trip counts Reviewed-by: vlivanov, thartmann, jrose ------------- PR: https://git.openjdk.java.net/jdk/pull/318 From redestad at openjdk.java.net Mon Oct 19 11:59:15 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 19 Oct 2020 11:59:15 GMT Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: On Mon, 19 Oct 2020 06:43:23 GMT, Aleksey Shipilev wrote: >> We have seen some specific benefits to increasing InlineSmallCode to 2500 from 2000, and across the whole promo build >> perf test collection the change is neutral to slightly positive, where the tests are run on modern OCI systems. >> Passed tier1 testing, some ad-hoc perf testing and more compiler related parts of the weekly promo performance set. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8254913 >> Thanks, >> Eric > > Changing the global inlining defaults without providing the justifying performance data is not acceptable. @cl4es, > please rescind your review if you can, so that change is not integrated by accident. What constitutes a full promotion is proprietary, but I can verify no regressions across at least: - SPECjvm2008* - SPECjbb2005 - SPECjbb2015** - Dacapo - Renaissance Notable improvements on Renaissance-DecTree (6%), Renaissance-Reactors (5%), Dacapo-lusearch (2%) Various micros in the OpenJDK corpus improve significantly, e.g., AES_ECB_NoPadding 10% ChaCha20Poly1305 7%. No detected regressions. I agree that some more info about code cache utilization across a more complete range of workloads is a reasonable request, but we've detected no regressions in our sample of footprint tests. I'm sure @ericcaspole won't integrate this without first coming back with more data on this. * not running the broken compiler sub-benchmarks ** interestingly most SPECjbb2015 publications in recent years have included -XX:InlineSmallCode=10k or more, see https://www.spec.org/jbb2015/results/ ------------- PR: https://git.openjdk.java.net/jdk/pull/705 From roland at openjdk.java.net Mon Oct 19 12:18:22 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 19 Oct 2020 12:18:22 GMT Subject: RFR: 8254998: C2: assert(!n->as_Loop()->is_transformed_long_loop()) failure with -XX:StressLongCountedLoop=1 Message-ID: Transformation of a long counted loop into a loop nest with a counted loop inner loop happens in 2 passes of loop opts: first the loop nest is created and then the inner loop is transformed into a counted loop. The assert fires because the second step doesn't have a chance to run given the first step was executed as the last loop opts pass before running out of allowed loop opts passes. The fix I propose for this corner case is to relax the assert to account for this. ------------- Commit messages: - test & fix Changes: https://git.openjdk.java.net/jdk/pull/735/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=735&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254998 Stats: 56 lines in 2 files changed: 55 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/735.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/735/head:pull/735 PR: https://git.openjdk.java.net/jdk/pull/735 From azeemj at openjdk.java.net Mon Oct 19 13:01:10 2020 From: azeemj at openjdk.java.net (Azeem Jiva) Date: Mon, 19 Oct 2020 13:01:10 GMT Subject: RFR: 8254966: Remove unused code from Matcher In-Reply-To: References: Message-ID: On Sun, 18 Oct 2020 12:11:41 GMT, Claes Redestad wrote: > These are unused: > > interpreter_method_reg* > compiler_method_reg* > pd_implicit_null_fixup (only existed to workaround some Win95 issues) > > Quite some code in the .ad files shook loose when removing these, so I will need some help verifying that the s390 and > ppc changes are fine. Marked as reviewed by azeemj (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/723 From adinn at openjdk.java.net Mon Oct 19 13:05:11 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 19 Oct 2020 13:05:11 GMT Subject: RFR: 8254884: Make sure jvm does not crash with Arm SVE and Vector API [v2] In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 10:26:22 GMT, Ningsheng Jian wrote: >> Currently we have not implemented all Arm SVE code generation for Vector API specific nodes. To make sure hotspot does >> not crash with bad AD file (as NEON has implemented them), we simply add those OPs to unsupported op list. >> This is the port and minor cleanup of JDK-8253211 in repo-panama: https://github.com/openjdk/panama-vector/pull/7 with >> Op_VectorUnbox (not for codegen) and Op_VectorMaskWrapper (actually unused node. dead code?) removed from the >> unsupported op list and Op_VectorLoadConst added. Test: tier1-3 on AArch64 and x86_64 as well as Vector API tests on >> AArch64 SVE. > > Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments from Vladimir. > > Change-Id: Ia484dc59f41d265972f1253193d1d7d7cb032a12 src/hotspot/cpu/aarch64/aarch64_sve.ad line 897: > 895: predicate(UseSVE > 0 && n->in(2)->bottom_type()->is_vect()->length_in_bytes() >= 16 && > 896: (n->in(2)->bottom_type()->is_vect()->element_basic_type() == T_SHORT || > 897: (n->in(2)->bottom_type()->is_vect()->element_basic_type() == T_CHAR))); Is it appropriate to do this reduction when element_basic_type() == T_CHAR? ------------- PR: https://git.openjdk.java.net/jdk/pull/726 From dnsimon at openjdk.java.net Mon Oct 19 13:30:20 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Mon, 19 Oct 2020 13:30:20 GMT Subject: RFR: 8255004: expose JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE access flag Message-ID: <3b5xjSpzOrKfNnpaz0sOe0FGFNP6zQYcWRI6G_dMG7M=.6d5abd66-9746-4896-ba27-5614f86f1789@github.com> This PR exposes to Graal the JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE access flag added in https://bugs.openjdk.java.net/browse/JDK-8157181. This allows Graal to take the correct action for final fields that are updated outside of `` and `` methods. ------------- Commit messages: - 8255004: expose JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE access flag Changes: https://git.openjdk.java.net/jdk/pull/737/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=737&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255004 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/737.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/737/head:pull/737 PR: https://git.openjdk.java.net/jdk/pull/737 From vlivanov at openjdk.java.net Mon Oct 19 16:00:13 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 19 Oct 2020 16:00:13 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially In-Reply-To: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Mon, 19 Oct 2020 08:36:53 GMT, Roberto Casta?eda Lozano wrote: > In the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) within `ConvI2LNode::Ideal()`, handle the > special case x = y by feeding both inputs of AddL from a single ConvI2L node rather than creating two semantically > equivalent ConvI2L nodes. This avoids an exponential number of calls to `ConvI2LNode::Ideal()` when dealing with long > chains of AddI nodes. Disable the optimization for the pattern ConvI2L(SubI(x, x)), which is simplified to zero during > parsing anyway. Add a set of regression tests for the transformation that cover different shapes of AddI subgraphs. > Also add a microbenchmark that exercises the special case, for performance regression testing. 2 concerns with the proposed fix on my side: - I?m not persuaded that covering ?x == y? is enough to completely eliminates the issue; - The issue demonstrates there?s still a chance to introduce very deep recursion involving `Compile::constrained_convI2L` and `PhaseIterGVN` which can cause a crash. IMO the root cause is an eager transformation happening top-down on `ConvI2L` nodes and it defeats memoization GVN naturally provides, so it causes a combinatorial explosion. If subsequent `Compile::constrained_convI2L()` calls could share the same `ConvI2L` node for the same input, it would be a more reliable fix for the problem. Otherwise, the transformation may be extracted from GVN and turned into a separate pass (take a look at `Compile::optimize_logic_cones` as an example). Some comments on the tests: (1) please, group the individual test cases into a single test class; and (2) I suggest to turn the benchmark into a test case which fails with timeout when fix is absent. ------------- Changes requested by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/727 From kvn at openjdk.java.net Mon Oct 19 17:00:18 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 19 Oct 2020 17:00:18 GMT Subject: RFR: 8254966: Remove unused code from Matcher In-Reply-To: References: Message-ID: On Sun, 18 Oct 2020 12:11:41 GMT, Claes Redestad wrote: > These are unused: > > interpreter_method_reg* > compiler_method_reg* > pd_implicit_null_fixup (only existed to workaround some Win95 issues) > > Quite some code in the .ad files shook loose when removing these, so I will need some help verifying that the s390 and > ppc changes are fine. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/723 From kvn at openjdk.java.net Mon Oct 19 17:10:12 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 19 Oct 2020 17:10:12 GMT Subject: RFR: 8254965: Remove unused Matcher::_ruleName and make ruleName non-product In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 09:29:36 GMT, Claes Redestad wrote: > Does that print really need to print the name of the rule - or would just printing `mach->rule()` suffice? For > sustainability purposes I suppose having the name spelled out is a nice to have rather than a strict necessity. mach->rule() value is only available in files generated from .ad file during build. If you don't have those files available it will be hard to map the value to mach instruction. You have to rebuild VM again and for that you need to have exact sources as one failed. It is doable but hassle. ------------- PR: https://git.openjdk.java.net/jdk/pull/722 From kvn at openjdk.java.net Mon Oct 19 17:12:18 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 19 Oct 2020 17:12:18 GMT Subject: RFR: 8254955: x86: MethodHandlesAdapterBlob is too big [v2] In-Reply-To: References: Message-ID: On Sat, 17 Oct 2020 22:58:21 GMT, Claes Redestad wrote: >> At some point JSR 292 was reworked to generate all but a small handful of interpreter stubs lazily, leaving the >> MethodHandlesAdapterBlob with a bit too much room to spare. >> The remaining stubs use less than 1000 bytes of memory in product builds, and less than 3k in debug builds. This patch >> adjust the sizes accordingly. >> Other platforms (except zero) seem like they could use a similar adjustment, but I don't have hardware available to >> check how big the interpreter stubs get on anything but x86 so I'll leave them untouched unless someone can run the >> numbers (`java -XX:+UnlockDiagnosticVMOptions -XX:+VerifyMethodHandles -XX:+LogCompilation` - grep the hotspot_log >> generated for MethodHandlesAdapterBlob or just blob since it's the first one) > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > ZGC & Shenandoah requires a little more space Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/717 From vlivanov at openjdk.java.net Mon Oct 19 17:29:16 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 19 Oct 2020 17:29:16 GMT Subject: RFR: 8254998: C2: assert(!n->as_Loop()->is_transformed_long_loop()) failure with -XX:StressLongCountedLoop=1 In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 12:07:46 GMT, Roland Westrelin wrote: > Transformation of a long counted loop into a loop nest with a counted > loop inner loop happens in 2 passes of loop opts: first the loop nest > is created and then the inner loop is transformed into a counted > loop. The assert fires because the second step doesn't have a chance > to run given the first step was executed as the last loop opts pass > before running out of allowed loop opts passes. The fix I propose for > this corner case is to relax the assert to account for this. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/735 From redestad at openjdk.java.net Mon Oct 19 17:33:15 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 19 Oct 2020 17:33:15 GMT Subject: RFR: 8254965: Remove unused Matcher::_ruleName and make ruleName non-product In-Reply-To: References: Message-ID: <6c09kl39yOXIoFO93ETZaCH9rwqvtra86CP07QSMjJ8=.532deaae-a7e3-4d98-a391-8306256c4598@github.com> On Mon, 19 Oct 2020 17:07:46 GMT, Vladimir Kozlov wrote: > > Does that print really need to print the name of the rule - or would just printing `mach->rule()` suffice? For > > sustainability purposes I suppose having the name spelled out is a nice to have rather than a strict necessity. > > mach->rule() value is only available in files generated from .ad file during build. If you don't have those files > available it will be hard to map the value to mach instruction. You have to rebuild VM again and for that you need to > have exact sources as one failed. It is doable but hassle. A hassle surely, but sustaining engineers routinely must set up exact source to debug an issue. So I think it's a judgement call, and if you prefer keeping it in I'm ok with that: reducing static footprint is nice, but a lump of things like this should never be paged in anyhow. A side-note: Matcher has several of these fields that only take a pointer to a .ad-defined array, e.g., _ruleName, _must_clone, _swallowed, ... - does this do any good, or could the code refer to the static globals directly? If only used for debugging or printouts like _ruleName it can surely go, but how about the others? ------------- PR: https://git.openjdk.java.net/jdk/pull/722 From redestad at openjdk.java.net Mon Oct 19 17:51:16 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 19 Oct 2020 17:51:16 GMT Subject: RFR: 8254955: x86: MethodHandlesAdapterBlob is too big [v2] In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 17:09:20 GMT, Vladimir Kozlov wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> ZGC & Shenandoah requires a little more space > > Good. @neliasso @vnkozlov - thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/717 From redestad at openjdk.java.net Mon Oct 19 17:54:09 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 19 Oct 2020 17:54:09 GMT Subject: Integrated: 8254955: x86: MethodHandlesAdapterBlob is too big In-Reply-To: References: Message-ID: On Sat, 17 Oct 2020 11:29:23 GMT, Claes Redestad wrote: > At some point JSR 292 was reworked to generate all but a small handful of interpreter stubs lazily, leaving the > MethodHandlesAdapterBlob with a bit too much room to spare. > The remaining stubs use less than 1000 bytes of memory in product builds, and less than 3k in debug builds. This patch > adjust the sizes accordingly. > Other platforms (except zero) seem like they could use a similar adjustment, but I don't have hardware available to > check how big the interpreter stubs get on anything but x86 so I'll leave them untouched unless someone can run the > numbers (`java -XX:+UnlockDiagnosticVMOptions -XX:+VerifyMethodHandles -XX:+LogCompilation` - grep the hotspot_log > generated for MethodHandlesAdapterBlob or just blob since it's the first one) This pull request has now been integrated. Changeset: e2e11d34 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/e2e11d34 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8254955: x86: MethodHandlesAdapterBlob is too big Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/717 From never at openjdk.java.net Mon Oct 19 18:12:11 2020 From: never at openjdk.java.net (Tom Rodriguez) Date: Mon, 19 Oct 2020 18:12:11 GMT Subject: RFR: 8255004: [JVMCI] expose JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE In-Reply-To: <3b5xjSpzOrKfNnpaz0sOe0FGFNP6zQYcWRI6G_dMG7M=.6d5abd66-9746-4896-ba27-5614f86f1789@github.com> References: <3b5xjSpzOrKfNnpaz0sOe0FGFNP6zQYcWRI6G_dMG7M=.6d5abd66-9746-4896-ba27-5614f86f1789@github.com> Message-ID: On Mon, 19 Oct 2020 13:18:55 GMT, Doug Simon wrote: > This PR exposes to Graal the JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE access flag added in > https://bugs.openjdk.java.net/browse/JDK-8157181. This allows Graal to take the correct action for final fields that > are updated outside of `` and `` methods. Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/737 From kvn at openjdk.java.net Mon Oct 19 18:12:16 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 19 Oct 2020 18:12:16 GMT Subject: RFR: 8254965: Remove unused Matcher::_ruleName and make ruleName non-product In-Reply-To: <6c09kl39yOXIoFO93ETZaCH9rwqvtra86CP07QSMjJ8=.532deaae-a7e3-4d98-a391-8306256c4598@github.com> References: <6c09kl39yOXIoFO93ETZaCH9rwqvtra86CP07QSMjJ8=.532deaae-a7e3-4d98-a391-8306256c4598@github.com> Message-ID: On Mon, 19 Oct 2020 17:30:19 GMT, Claes Redestad wrote: >>> Does that print really need to print the name of the rule - or would just printing `mach->rule()` suffice? For >>> sustainability purposes I suppose having the name spelled out is a nice to have rather than a strict necessity. >> >> mach->rule() value is only available in files generated from .ad file during build. If you don't have those files >> available it will be hard to map the value to mach instruction. You have to rebuild VM again and for that you need to >> have exact sources as one failed. It is doable but hassle. > >> > Does that print really need to print the name of the rule - or would just printing `mach->rule()` suffice? For >> > sustainability purposes I suppose having the name spelled out is a nice to have rather than a strict necessity. >> >> mach->rule() value is only available in files generated from .ad file during build. If you don't have those files >> available it will be hard to map the value to mach instruction. You have to rebuild VM again and for that you need to >> have exact sources as one failed. It is doable but hassle. > > A hassle surely, but sustaining engineers routinely must set up exact source to debug an issue. So I think it's a > judgement call, and if you prefer keeping it in I'm ok with that: reducing static footprint is nice, but a lump of > things like this should never be paged in anyhow. A side-note: Matcher has several of these fields that only take a > pointer to a .ad-defined array, e.g., _ruleName, _must_clone, _swallowed, ... - does this do any good, or could the > code refer to the static globals directly? If only used for debugging or printouts like _ruleName it can surely go, but > how about the others? _mist_clone[] and _swallowed[] (which should be boolean) are used for code generation. Looks like ruleName[] is not use in product but again it is useful for debugging product VM. I am not sure how you can convert array element references to static globals. ------------- PR: https://git.openjdk.java.net/jdk/pull/722 From kvn at openjdk.java.net Mon Oct 19 18:30:15 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 19 Oct 2020 18:30:15 GMT Subject: RFR: 8255004: [JVMCI] expose JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE In-Reply-To: <3b5xjSpzOrKfNnpaz0sOe0FGFNP6zQYcWRI6G_dMG7M=.6d5abd66-9746-4896-ba27-5614f86f1789@github.com> References: <3b5xjSpzOrKfNnpaz0sOe0FGFNP6zQYcWRI6G_dMG7M=.6d5abd66-9746-4896-ba27-5614f86f1789@github.com> Message-ID: <5OmI5O49TP9ZVdqnn-FPwLlsS7xZaAIw9kELjKtrpyE=.f254fd61-e0ad-4e69-834c-95a837470570@github.com> On Mon, 19 Oct 2020 13:18:55 GMT, Doug Simon wrote: > This PR exposes to Graal the JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE access flag added in > https://bugs.openjdk.java.net/browse/JDK-8157181. This allows Graal to take the correct action for final fields that > are updated outside of `` and `` methods. What about adding it to HotSpotVMConfig.java ? I see other flags there. ------------- PR: https://git.openjdk.java.net/jdk/pull/737 From redestad at openjdk.java.net Mon Oct 19 18:32:13 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 19 Oct 2020 18:32:13 GMT Subject: RFR: 8254965: Remove unused Matcher::_ruleName and make ruleName non-product In-Reply-To: References: <6c09kl39yOXIoFO93ETZaCH9rwqvtra86CP07QSMjJ8=.532deaae-a7e3-4d98-a391-8306256c4598@github.com> Message-ID: On Mon, 19 Oct 2020 18:09:17 GMT, Vladimir Kozlov wrote: > _mist_clone[] and _swallowed[] (which should be boolean) Yeah, I was surprised to see `char[]` rather than `bool[]` too (but apparently `sizeof(bool) >= sizeof(char)`). Perhaps we could profitably generate these boolean arrays as constexpr'd `BitMap`s at some point. > Looks like ruleName[] is not use in product but again it is useful for debugging product VM. Ok, I'll repurpose this RFE to remove the `Matcher::_ruleName` field then, shall I? ------------- PR: https://git.openjdk.java.net/jdk/pull/722 From kvn at openjdk.java.net Mon Oct 19 18:36:15 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 19 Oct 2020 18:36:15 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v4] In-Reply-To: References: Message-ID: On Sun, 18 Oct 2020 18:39:18 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 >> bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized >> instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag >> ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. 4) Based on the >> perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types >> (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : >> [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : >> [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now > contains four commits: > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Replacing explicit type checks with existing type checking routines > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions. There is regression after 8252847 changes: 8254890. It should be fixed before we proceed with these changes. ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/302 From kvn at openjdk.java.net Mon Oct 19 18:40:10 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 19 Oct 2020 18:40:10 GMT Subject: RFR: 8254965: Remove unused Matcher::_ruleName and make ruleName non-product In-Reply-To: References: <6c09kl39yOXIoFO93ETZaCH9rwqvtra86CP07QSMjJ8=.532deaae-a7e3-4d98-a391-8306256c4598@github.com> Message-ID: On Mon, 19 Oct 2020 18:28:02 GMT, Claes Redestad wrote: > > _mist_clone[] and _swallowed[] (which should be boolean) > > Yeah, I was surprised to see `char[]` rather than `bool[]` too (but apparently `sizeof(bool) >= sizeof(char)`). Perhaps > we could profitably generate these boolean arrays as constexpr'd `BitMap`s at some point. > > Looks like ruleName[] is not use in product but again it is useful for debugging product VM. > > Ok, I'll repurpose this RFE to remove the `Matcher::_ruleName` field then, shall I? Sorry, I mean regName[]. ruleName[] is what you want to remove now. ------------- PR: https://git.openjdk.java.net/jdk/pull/722 From rcastanedalo at openjdk.java.net Mon Oct 19 18:49:18 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Oct 2020 18:49:18 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Mon, 19 Oct 2020 15:56:57 GMT, Vladimir Ivanov wrote: > 2 concerns with the proposed fix on my side: > > * I?m not persuaded that covering ?x == y? is enough to completely eliminates the issue; > > * The issue demonstrates there?s still a chance to introduce very deep recursion involving `Compile::constrained_convI2L` > and `PhaseIterGVN` which can cause a crash. > > > IMO the root cause is an eager transformation happening top-down on `ConvI2L` nodes and it defeats memoization GVN > naturally provides, so it causes a combinatorial explosion. If subsequent `Compile::constrained_convI2L()` calls could > share the same `ConvI2L` node for the same input, it would be a more reliable fix for the problem. Otherwise, the > transformation may be extracted from GVN and turned into a separate pass (take a look at > `Compile::optimize_logic_cones` as an example). Some comments on the tests: (1) please, group the individual test > cases into a single test class; and (2) I suggest to turn the benchmark into a test case which fails with timeout when > fix is absent. Thanks Vladimir for the thorough review! I will explore your suggestions to generalize the fix and see what can be done. ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From dnsimon at openjdk.java.net Mon Oct 19 19:04:09 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Mon, 19 Oct 2020 19:04:09 GMT Subject: RFR: 8255004: [JVMCI] expose JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE In-Reply-To: <5OmI5O49TP9ZVdqnn-FPwLlsS7xZaAIw9kELjKtrpyE=.f254fd61-e0ad-4e69-834c-95a837470570@github.com> References: <3b5xjSpzOrKfNnpaz0sOe0FGFNP6zQYcWRI6G_dMG7M=.6d5abd66-9746-4896-ba27-5614f86f1789@github.com> <5OmI5O49TP9ZVdqnn-FPwLlsS7xZaAIw9kELjKtrpyE=.f254fd61-e0ad-4e69-834c-95a837470570@github.com> Message-ID: <7M17k42OL6Jnl5-gRpQcZ3XVqFZlzKz8Hzqj4-cMlyI=.5b45e332-bd74-40f6-b9dc-33fa353c6743@github.com> On Mon, 19 Oct 2020 18:27:12 GMT, Vladimir Kozlov wrote: >> This PR exposes to Graal the JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE access flag added in >> https://bugs.openjdk.java.net/browse/JDK-8157181. This allows Graal to take the correct action for final fields that >> are updated outside of `` and `` methods. > > What about adding it to HotSpotVMConfig.java ? I see other flags there. It will only be used from jdk.internal.vm.compiler (i.e. in `GraalHotSpotVMConfig`). The jdk.internal.vm.ci module has no need to access it. ------------- PR: https://git.openjdk.java.net/jdk/pull/737 From kvn at openjdk.java.net Mon Oct 19 19:34:09 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 19 Oct 2020 19:34:09 GMT Subject: RFR: 8255004: [JVMCI] expose JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE In-Reply-To: <3b5xjSpzOrKfNnpaz0sOe0FGFNP6zQYcWRI6G_dMG7M=.6d5abd66-9746-4896-ba27-5614f86f1789@github.com> References: <3b5xjSpzOrKfNnpaz0sOe0FGFNP6zQYcWRI6G_dMG7M=.6d5abd66-9746-4896-ba27-5614f86f1789@github.com> Message-ID: <23A21wGrAafykq5FwH4SJSlyu4ykbJ4sFW9K0rNiDV8=.6c9b40d6-e595-41c3-8af3-e6f4b4db9c14@github.com> On Mon, 19 Oct 2020 13:18:55 GMT, Doug Simon wrote: > This PR exposes to Graal the JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE access flag added in > https://bugs.openjdk.java.net/browse/JDK-8157181. This allows Graal to take the correct action for final fields that > are updated outside of `` and `` methods. Okay. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/737 From dnsimon at openjdk.java.net Mon Oct 19 19:37:16 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Mon, 19 Oct 2020 19:37:16 GMT Subject: RFR: 8255004: [JVMCI] expose JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE In-Reply-To: <23A21wGrAafykq5FwH4SJSlyu4ykbJ4sFW9K0rNiDV8=.6c9b40d6-e595-41c3-8af3-e6f4b4db9c14@github.com> References: <3b5xjSpzOrKfNnpaz0sOe0FGFNP6zQYcWRI6G_dMG7M=.6d5abd66-9746-4896-ba27-5614f86f1789@github.com> <23A21wGrAafykq5FwH4SJSlyu4ykbJ4sFW9K0rNiDV8=.6c9b40d6-e595-41c3-8af3-e6f4b4db9c14@github.com> Message-ID: On Mon, 19 Oct 2020 19:31:25 GMT, Vladimir Kozlov wrote: >> This PR exposes to Graal the JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE access flag added in >> https://bugs.openjdk.java.net/browse/JDK-8157181. This allows Graal to take the correct action for final fields that >> are updated outside of `` and `` methods. > > Okay. Thanks for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/737 From kvn at openjdk.java.net Mon Oct 19 19:39:10 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 19 Oct 2020 19:39:10 GMT Subject: RFR: 8254998: C2: assert(!n->as_Loop()->is_transformed_long_loop()) failure with -XX:StressLongCountedLoop=1 In-Reply-To: References: Message-ID: <58S2Wlk9b9PJkOMe6CnL8AtkxAzc8KTlYwZXwf9O5G4=.3db4ab58-3233-4883-8073-a05135227bd7@github.com> On Mon, 19 Oct 2020 12:07:46 GMT, Roland Westrelin wrote: > Transformation of a long counted loop into a loop nest with a counted > loop inner loop happens in 2 passes of loop opts: first the loop nest > is created and then the inner loop is transformed into a counted > loop. The assert fires because the second step doesn't have a chance > to run given the first step was executed as the last loop opts pass > before running out of allowed loop opts passes. The fix I propose for > this corner case is to relax the assert to account for this. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/735 From dnsimon at openjdk.java.net Mon Oct 19 19:42:16 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Mon, 19 Oct 2020 19:42:16 GMT Subject: Integrated: 8255004: [JVMCI] expose JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE In-Reply-To: <3b5xjSpzOrKfNnpaz0sOe0FGFNP6zQYcWRI6G_dMG7M=.6d5abd66-9746-4896-ba27-5614f86f1789@github.com> References: <3b5xjSpzOrKfNnpaz0sOe0FGFNP6zQYcWRI6G_dMG7M=.6d5abd66-9746-4896-ba27-5614f86f1789@github.com> Message-ID: On Mon, 19 Oct 2020 13:18:55 GMT, Doug Simon wrote: > This PR exposes to Graal the JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE access flag added in > https://bugs.openjdk.java.net/browse/JDK-8157181. This allows Graal to take the correct action for final fields that > are updated outside of `` and `` methods. This pull request has now been integrated. Changeset: 14e1e174 Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/14e1e174 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8255004: [JVMCI] expose JVM_ACC_FIELD_INITIALIZED_FINAL_UPDATE Reviewed-by: never, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/737 From github.com+8022952+nhat-nguyen at openjdk.java.net Mon Oct 19 20:06:29 2020 From: github.com+8022952+nhat-nguyen at openjdk.java.net (Nhat Nguyen) Date: Mon, 19 Oct 2020 20:06:29 GMT Subject: RFR: 8251271: C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes [v2] In-Reply-To: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> References: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> Message-ID: > I'm following up on this thread http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039910.html > I'm clearing `for_igvn` before restoring as suggested by Xin Nhat Nguyen has updated the pull request incrementally with one additional commit since the last revision: Clear for_igvn in PhaseRenumberLive ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/713/files - new: https://git.openjdk.java.net/jdk/pull/713/files/4ab0abc7..35835f81 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=713&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=713&range=00-01 Stats: 10 lines in 2 files changed: 4 ins; 5 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/713.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/713/head:pull/713 PR: https://git.openjdk.java.net/jdk/pull/713 From github.com+8022952+nhat-nguyen at openjdk.java.net Mon Oct 19 20:06:30 2020 From: github.com+8022952+nhat-nguyen at openjdk.java.net (Nhat Nguyen) Date: Mon, 19 Oct 2020 20:06:30 GMT Subject: RFR: 8251271: C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes [v2] In-Reply-To: References: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> Message-ID: On Sat, 17 Oct 2020 21:38:35 GMT, Xin Liu wrote: >> src/hotspot/share/opto/compile.cpp line 2115: >> >>> 2113: // Clear the for_igvn list because it may have irrelevant nodes >>> 2114: // from the previous PhaseRenumberLive run. >>> 2115: save_for_igvn->clear(); >> >> Since `PhaseRenumberLive::PhaseRenumberLive` moves nodes from `for_igvn()` to `new_worklist`, does it make more sense >> to drain original `for_igvn()` worklist there? > > +1 Thank you for the suggestion. I have updated the PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/713 From vlivanov at openjdk.java.net Mon Oct 19 20:09:17 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 19 Oct 2020 20:09:17 GMT Subject: RFR: 8255026: C2: Miscellaneous cleanups in Compile and PhaseIdealLoop code Message-ID: Miscellaneous cleanups in Compile and PhaseIdealLoop code. Compile: - inline node lists - remove empty Compile::register_library_intrinsics() - introduce `Compile::remove_useless_nodes(GrowableArray& node_list, Unique_Node_List& useful)` PhaseIdealLoop: - unify logging and IGVN logic - refactor logging code Testing: tier1 - tier5 ------------- Commit messages: - 8255026: C2: Miscellaneous cleanups in Compile and PhaseIdealLoop code Changes: https://git.openjdk.java.net/jdk/pull/749/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=749&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255026 Stats: 189 lines in 6 files changed: 33 ins; 72 del; 84 mod Patch: https://git.openjdk.java.net/jdk/pull/749.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/749/head:pull/749 PR: https://git.openjdk.java.net/jdk/pull/749 From dnsimon at openjdk.java.net Mon Oct 19 20:11:20 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Mon, 19 Oct 2020 20:11:20 GMT Subject: RFR: 8254842: [JVMCI] copy thread name when attaching libgraal thread to HotSpot [v3] In-Reply-To: References: Message-ID: <4vCNMX142vG-Zk_QvZup4fzXUY7bW_W477GN7MdSDac=.765e270f-29a3-4c9e-ad85-fbff47f7432c@github.com> > This PR modifies `HotSpotJVMCIRuntime.attachCurrentThread` when it is called from within libgraal so that the name of > the thread in the libgraal heap is used as the name of the peer thread created in HotSpot. > This useful when viewing output such as `-XX:JVMCITraceLevel=1`. For example, here's sample output without this PR: > JVMCITrace-1[Thread-0]: initializing JVMCI runtime -1 > JVMCITrace-1[Thread-0]: initialized JVMCI runtime -1 > and then with: > JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initializing JVMCI runtime -1 > JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initialized JVMCI runtime -1 Doug Simon has updated the pull request incrementally with one additional commit since the last revision: fixed whitespace error ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/684/files - new: https://git.openjdk.java.net/jdk/pull/684/files/726faca5..83dc8df6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=684&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=684&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/684.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/684/head:pull/684 PR: https://git.openjdk.java.net/jdk/pull/684 From kvn at openjdk.java.net Mon Oct 19 20:19:14 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 19 Oct 2020 20:19:14 GMT Subject: RFR: 8254994: [x86] C1 StubAssembler::call_RT, "call_offset might not be initialized" In-Reply-To: <8NE6tuVKcgl7wBq7dSmsKmeLWxdYr4oPAPnIfXW2Ml0=.050ad557-d640-495c-88cc-e35e2eae1513@github.com> References: <8NE6tuVKcgl7wBq7dSmsKmeLWxdYr4oPAPnIfXW2Ml0=.050ad557-d640-495c-88cc-e35e2eae1513@github.com> Message-ID: On Mon, 19 Oct 2020 10:14:03 GMT, Aleksey Shipilev wrote: > Static analyzers complain `call_offset` might not be initialized in `StubAssembler::call_RT` in `c1_Runtime1_x86.cpp`. > The way I see it, it depends on `align_stack` value, and it is set whether it `align_stack` is `true` or `false`. But > we can probably make it cleaner so that future errors would be clearly detectable. Since it is initialized off > `offset()`, it should not be negative. Testing: > - [x] Linux x86_64 tier1 Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/730 From kvn at openjdk.java.net Mon Oct 19 20:27:29 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 19 Oct 2020 20:27:29 GMT Subject: RFR: 8255027: Problem list for Graal test gc/stress/TestStressG1Humongous.java Message-ID: TestStressG1Humongous.java fails sporadically with Graal. Problem list it until JDK-8218176 is resolved. The test was on ProblemList-graal.txt but it was removed from it by JDK-8218173. ------------- Commit messages: - 8255027: Problem list for Graal test gc/stress/TestStressG1Humongous.java Changes: https://git.openjdk.java.net/jdk/pull/750/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=750&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255027 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/750.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/750/head:pull/750 PR: https://git.openjdk.java.net/jdk/pull/750 From kvn at openjdk.java.net Mon Oct 19 20:29:18 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 19 Oct 2020 20:29:18 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v9] In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 11:14:22 GMT, Fei Yang wrote: >> Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com >> >> This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. >> Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 >> >> Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. >> For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. >> "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. >> >> We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. >> Patch passed jtreg tier1-3 tests with QEMU system emulator. >> Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make >> sure that there's no regression. >> We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java >> We measured the performance benefit with an aarch64 cycle-accurate simulator. >> Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. >> >> For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on >> real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is >> detected. > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains > 11 commits: > - Merge master > - Remove unnecessary code changes in vm_version_aarch64.cpp > - Merge master > - Merge master > - Merge master > - Merge master > - Add sha3 instructions to cpu/aarch64/aarch64-asmtest.py and regenerate the test in assembler_aarch64.cpp:asm_check > - Rebase > - Merge master > - Fix trailing whitespace issue > - ... and 1 more: https://git.openjdk.java.net/jdk/compare/e9be2db7...05551701 Always run graalunit testing with new intrinsics. You need to adjust Graal test: src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/207 From vlivanov at openjdk.java.net Mon Oct 19 20:32:18 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 19 Oct 2020 20:32:18 GMT Subject: RFR: 8251271: C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes [v2] In-Reply-To: References: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> Message-ID: On Mon, 19 Oct 2020 20:06:29 GMT, Nhat Nguyen wrote: >> I'm following up on this thread http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039910.html >> I'm clearing `for_igvn` before restoring as suggested by Xin > > Nhat Nguyen has updated the pull request incrementally with one additional commit since the last revision: > > Clear for_igvn in PhaseRenumberLive Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/713 From redestad at openjdk.java.net Mon Oct 19 22:04:09 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 19 Oct 2020 22:04:09 GMT Subject: Withdrawn: 8254965: Remove unused Matcher::_ruleName and make ruleName non-product In-Reply-To: References: Message-ID: <-5PsMk4eCEH9ZBGZQt6H1VeBFledgIWOErlQIKntUjI=.897d1fa8-dbfd-4b43-8bc3-a98679005909@github.com> On Sun, 18 Oct 2020 09:22:45 GMT, Claes Redestad wrote: > - Matcher::_ruleName is unused - remove > - ruleName, defined via ad, is only used by a non-product routine. #ifdef it accordingly > > This reduces the JVM size (-63Kb on linux-x64, or about 0.25%) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/722 From dlong at openjdk.java.net Mon Oct 19 22:37:09 2020 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 19 Oct 2020 22:37:09 GMT Subject: RFR: 8255027: Problem list for Graal test gc/stress/TestStressG1Humongous.java In-Reply-To: References: Message-ID: <0MXicXFtsEpbOqB0_-XPFo2MtYpOPVDexD3Ds9M5Aj0=.19c2fd68-b686-40c2-9169-035eda5078cb@github.com> On Mon, 19 Oct 2020 20:10:33 GMT, Vladimir Kozlov wrote: > TestStressG1Humongous.java fails sporadically with Graal. Problem list it until JDK-8218176 is resolved. > The test was on ProblemList-graal.txt but it was removed from it by JDK-8218173. Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/750 From never at openjdk.java.net Mon Oct 19 23:38:18 2020 From: never at openjdk.java.net (Tom Rodriguez) Date: Mon, 19 Oct 2020 23:38:18 GMT Subject: RFR: 8254842: [JVMCI] copy thread name when attaching libgraal thread to HotSpot [v3] In-Reply-To: <4vCNMX142vG-Zk_QvZup4fzXUY7bW_W477GN7MdSDac=.765e270f-29a3-4c9e-ad85-fbff47f7432c@github.com> References: <4vCNMX142vG-Zk_QvZup4fzXUY7bW_W477GN7MdSDac=.765e270f-29a3-4c9e-ad85-fbff47f7432c@github.com> Message-ID: On Mon, 19 Oct 2020 20:11:20 GMT, Doug Simon wrote: >> This PR modifies `HotSpotJVMCIRuntime.attachCurrentThread` when it is called from within libgraal so that the name of >> the thread in the libgraal heap is used as the name of the peer thread created in HotSpot. >> This useful when viewing output such as `-XX:JVMCITraceLevel=1`. For example, here's sample output without this PR: >> JVMCITrace-1[Thread-0]: initializing JVMCI runtime -1 >> JVMCITrace-1[Thread-0]: initialized JVMCI runtime -1 >> and then with: >> JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initializing JVMCI runtime -1 >> JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initialized JVMCI runtime -1 > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed whitespace error Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/684 From xliu at openjdk.java.net Mon Oct 19 23:49:21 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 19 Oct 2020 23:49:21 GMT Subject: RFR: 8251271: C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes [v2] In-Reply-To: References: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> Message-ID: On Mon, 19 Oct 2020 19:57:11 GMT, Nhat Nguyen wrote: >> +1 > > Thank you for the suggestion. I have updated the PR. LGTM ------------- PR: https://git.openjdk.java.net/jdk/pull/713 From kvn at openjdk.java.net Tue Oct 20 00:16:10 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 20 Oct 2020 00:16:10 GMT Subject: Integrated: 8255027: Problem list for Graal test gc/stress/TestStressG1Humongous.java In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 20:10:33 GMT, Vladimir Kozlov wrote: > TestStressG1Humongous.java fails sporadically with Graal. Problem list it until JDK-8218176 is resolved. > The test was on ProblemList-graal.txt but it was removed from it by JDK-8218173. This pull request has now been integrated. Changeset: 7a580ca8 Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/7a580ca8 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8255027: Problem list for Graal test gc/stress/TestStressG1Humongous.java Reviewed-by: dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/750 From sandhya.viswanathan at intel.com Tue Oct 20 00:38:24 2020 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 20 Oct 2020 00:38:24 +0000 Subject: Howto replicate failure of 8254790? In-Reply-To: References: <4bd7e9f73ea24ae09f1bb0f1808ce5a7@EX13D46EUB003.ant.amazon.com> <661485ab-7de7-26cb-b2b1-3a4f643125eb@oracle.com> <617f010e-629d-7329-ac72-dce797bf3075@oracle.com> Message-ID: Hi Jason, I think I found the problem looking at the error log from Vladimir Kozlov. In stringL_indexof_char() function, the following snippet is the cause of problem: 2807 bind(FOUND_CHAR); 2808 if (UseAVX >= 2) { 2809 vpmovmskb(tmp, vec3); 2810 } else { 2811 pmovmskb(tmp, vec3); 2812 } 2813 bsfl(ch, tmp); 2814 addl(result, ch); <==== The problem is here 2815 2816 bind(FOUND_SEQ_CHAR); 2817 subptr(result, str1); The line addl(result, ch) should have been addptr(result, ch). The same problem exists in the Unicode string index of char intrinsic as well and need to be fixed. Hope this helps. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Vladimir Kozlov Sent: Thursday, October 15, 2020 3:59 PM To: Tatton, Jason ; David Holmes ; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net Subject: Re: Howto replicate failure of 8254790? Hi Jason, I added surrounding instructions dump from hs_err file we have so you can reconstruct x86 assembler from it. If you look on si_addr: 0x00000000e41d2718 which case memory map failure, it looks like R8 =0x00000007e41d2700 is an oop: [B with upper 32-bits zeroed. It seems uppers 32-bits of address were cut. But I don't see it can happens in stringL_indexof_char() sub. You correctly used movptr() and addptr() instructions. Vladimir K On 10/15/20 2:10 PM, Tatton, Jason wrote: > Thanks Vladimir and David, I have access to a new macbook with an Intel i7-9750H (supports AVX2) so I will try on that. > > -----Original Message----- > From: Vladimir Kozlov > Sent: 15 October 2020 20:25 > To: David Holmes ; Tatton, Jason > ; hotspot-compiler-dev at openjdk.java.net; > core-libs-dev at openjdk.java.net > Subject: RE: [EXTERNAL] Howto replicate failure of 8254790? > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > Note, we have old Mac machines in our testing env: > cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, > sse4.2, popcnt, lzcnt, tsc, tscinvbit, avx, avx2, aes, erms, clmul, > bmi1, bmi2, rtm, adx, fma, vzeroupper, clflush, clflushopt > > Use -XX:UseAVX=2 > > But I was not able reproduce failure on my Skylake Linux machine even with -XX:UseAVX=2. Maybe there are other factors on MacOS. > > Regards, > Vladimir K > > On 10/14/20 5:48 PM, David Holmes wrote: >> Hi Jason, >> >> On 15/10/2020 10:42 am, Tatton, Jason wrote: >>> Hi all, >>> >>> >>> >>> I am trying to replicate the failure of the tier2 test mentioned in >>> 8254790 but I am >>> only seeing it pass under an x86 linux machine. Are there any specific architectural constraints under which this test should be run in order to make it fail? >> >> It failed on a Mac, not Linux. >> >> Cheers, >> David >> >>> >>> >>> I am running the test via: make test TEST="test/jdk/javax/xml/crypto/dsig/GenerationTests.java" >>> >>> >>> >>> Note that I am running the test against master without the commit: >>> "8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed" which disables the intrinsic that is causing the test to fail. >>> >>> >>> >>> Thanks >>> -- >>> Jason >>> From shade at openjdk.java.net Tue Oct 20 05:35:08 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 20 Oct 2020 05:35:08 GMT Subject: Integrated: 8254994: [x86] C1 StubAssembler::call_RT, "call_offset might not be initialized" In-Reply-To: <8NE6tuVKcgl7wBq7dSmsKmeLWxdYr4oPAPnIfXW2Ml0=.050ad557-d640-495c-88cc-e35e2eae1513@github.com> References: <8NE6tuVKcgl7wBq7dSmsKmeLWxdYr4oPAPnIfXW2Ml0=.050ad557-d640-495c-88cc-e35e2eae1513@github.com> Message-ID: <_UFd-PUlKs5sayD0aQVriHzUSjTd42r4A_EigHfaE88=.4e184b71-ee15-4316-9271-1136c5b6ee04@github.com> On Mon, 19 Oct 2020 10:14:03 GMT, Aleksey Shipilev wrote: > Static analyzers complain `call_offset` might not be initialized in `StubAssembler::call_RT` in `c1_Runtime1_x86.cpp`. > The way I see it, it depends on `align_stack` value, and it is set whether it `align_stack` is `true` or `false`. But > we can probably make it cleaner so that future errors would be clearly detectable. Since it is initialized off > `offset()`, it should not be negative. Testing: > - [x] Linux x86_64 tier1 This pull request has now been integrated. Changeset: 355f44dd Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/355f44dd Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8254994: [x86] C1 StubAssembler::call_RT, "call_offset might not be initialized" Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/730 From thartmann at openjdk.java.net Tue Oct 20 05:42:12 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 20 Oct 2020 05:42:12 GMT Subject: RFR: 8254805: compiler/debug/TestStressCM.java is still failing In-Reply-To: References: Message-ID: On Fri, 16 Oct 2020 06:57:33 GMT, Roberto Casta?eda Lozano wrote: > Use the code motion trace produced by `TraceOptoPipelining` (excluding traces of stubs) to assert that two compilations > with the same seed cause `StressLCM` and `StressGCM` to take the same randomized decisions. Previously, the entire > output produced by `PrintOptoStatistics` was used instead, which has shown to be too fragile. Also, disable inlining in > both `TestStressCM.java` and the similar `TestStressIGVN.java` to prevent flaky behavior, and run both tests for ten > different seeds to improve coverage. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/693 From thartmann at openjdk.java.net Tue Oct 20 05:48:15 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 20 Oct 2020 05:48:15 GMT Subject: RFR: 8254998: C2: assert(!n->as_Loop()->is_transformed_long_loop()) failure with -XX:StressLongCountedLoop=1 In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 12:07:46 GMT, Roland Westrelin wrote: > Transformation of a long counted loop into a loop nest with a counted > loop inner loop happens in 2 passes of loop opts: first the loop nest > is created and then the inner loop is transformed into a counted > loop. The assert fires because the second step doesn't have a chance > to run given the first step was executed as the last loop opts pass > before running out of allowed loop opts passes. The fix I propose for > this corner case is to relax the assert to account for this. Looks good to me. test/hotspot/jtreg/compiler/longcountedloops/TestTooManyLoopOpts.java line 26: > 24: /** > 25: * @test > 26: * @bug JDK-8254998 I didn't know that the bug ID format with "JDK-" is accepted as well but I would suggest to remove it for consistency with other tests. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/735 From rcastanedalo at openjdk.java.net Tue Oct 20 05:50:11 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 20 Oct 2020 05:50:11 GMT Subject: RFR: 8254805: compiler/debug/TestStressCM.java is still failing In-Reply-To: References: Message-ID: <6PuaInJA-Bzej5CZOpoJT51VMiCgaLmxH4rrr3epe_4=.2e023a87-164c-4d12-9b1e-5472d2134e76@github.com> On Tue, 20 Oct 2020 05:39:44 GMT, Tobias Hartmann wrote: > Looks good to me. Thanks Tobias! ------------- PR: https://git.openjdk.java.net/jdk/pull/693 From rcastanedalo at openjdk.java.net Tue Oct 20 06:12:14 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 20 Oct 2020 06:12:14 GMT Subject: Integrated: 8254805: compiler/debug/TestStressCM.java is still failing In-Reply-To: References: Message-ID: On Fri, 16 Oct 2020 06:57:33 GMT, Roberto Casta?eda Lozano wrote: > Use the code motion trace produced by `TraceOptoPipelining` (excluding traces of stubs) to assert that two compilations > with the same seed cause `StressLCM` and `StressGCM` to take the same randomized decisions. Previously, the entire > output produced by `PrintOptoStatistics` was used instead, which has shown to be too fragile. Also, disable inlining in > both `TestStressCM.java` and the similar `TestStressIGVN.java` to prevent flaky behavior, and run both tests for ten > different seeds to improve coverage. This pull request has now been integrated. Changeset: 98ec4a67 Author: Roberto Casta?eda Lozano Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/98ec4a67 Stats: 34 lines in 2 files changed: 21 ins; 3 del; 10 mod 8254805: compiler/debug/TestStressCM.java is still failing Use the code motion trace produced by TraceOptoPipelining (excluding traces of stubs) to assert that two compilations with the same seed cause StressLCM and StressGCM to take the same randomized decisions. Previously, the entire output produced by PrintOptoStatistics was used instead, which has shown to be too fragile. Also, disable inlining in both TestStressCM.java and the similar TestStressIGVN.java to prevent flaky behavior, and run both tests for ten different seeds to improve coverage. Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/693 From thartmann at openjdk.java.net Tue Oct 20 06:15:15 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 20 Oct 2020 06:15:15 GMT Subject: RFR: 8255026: C2: Miscellaneous cleanups in Compile and PhaseIdealLoop code In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 19:55:58 GMT, Vladimir Ivanov wrote: > Miscellaneous cleanups in Compile and PhaseIdealLoop code. > > Compile: > > - inline node lists > > - remove empty Compile::register_library_intrinsics() > > - introduce `Compile::remove_useless_nodes(GrowableArray& node_list, Unique_Node_List& useful)` > > > PhaseIdealLoop: > > - unify logging and IGVN logic > > - refactor logging code > > Testing: tier1 - tier5 Nice cleanup! Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/749 From thartmann at openjdk.java.net Tue Oct 20 06:19:10 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 20 Oct 2020 06:19:10 GMT Subject: RFR: 8251271: C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes [v2] In-Reply-To: References: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> Message-ID: <0d9K5KZv01DQZbTtTrSI544KBTvTx-y0R-VYbOEqQ6w=.8c975ae5-32d7-4727-a5e1-a5a04b00e0e3@github.com> On Mon, 19 Oct 2020 20:06:29 GMT, Nhat Nguyen wrote: >> I'm following up on this thread http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039910.html >> I'm clearing `for_igvn` before restoring as suggested by Xin > > Nhat Nguyen has updated the pull request incrementally with one additional commit since the last revision: > > Clear for_igvn in PhaseRenumberLive Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/713 From github.com+8022952+nhat-nguyen at openjdk.java.net Tue Oct 20 06:22:16 2020 From: github.com+8022952+nhat-nguyen at openjdk.java.net (Nhat Nguyen) Date: Tue, 20 Oct 2020 06:22:16 GMT Subject: Integrated: 8251271: C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes In-Reply-To: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> References: <3lcmFjOqS5m9qn0sqq0p6VVHAp16nAlapeVUAFeBiOI=.d100117c-a727-473c-82a4-80fc98b17895@github.com> Message-ID: On Fri, 16 Oct 2020 23:02:46 GMT, Nhat Nguyen wrote: > I'm following up on this thread http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039910.html > I'm clearing `for_igvn` before restoring as suggested by Xin This pull request has now been integrated. Changeset: 5fedfa70 Author: Nhat Nguyen Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/5fedfa70 Stats: 7 lines in 2 files changed: 6 ins; 0 del; 1 mod 8251271: C2: Compile::_for_igvn list is corrupted after RenumberLiveNodes Reviewed-by: vlivanov, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/713 From thartmann at openjdk.java.net Tue Oct 20 06:27:19 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 20 Oct 2020 06:27:19 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v4] In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 18:33:22 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now >> contains four commits: >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Replacing explicit type checks with existing type checking routines >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions. > > There is regression after 8252847 changes: 8254890. > It should be fixed before we proceed with these changes. [JDK-8254890](https://bugs.openjdk.java.net/browse/JDK-8254890) is a closed bug because it contains confidential information. I've filed [JDK-8255039](https://bugs.openjdk.java.net/browse/JDK-8255039). ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From bulasevich at openjdk.java.net Tue Oct 20 06:55:23 2020 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Tue, 20 Oct 2020 06:55:23 GMT Subject: RFR: 8249893: AARCH64: optimize the construction of the value from the bits of the other two In-Reply-To: References: <5n3SJE02oD_SW_psT84VEJh22lomGJfJtARdyjf0Kcw=.acff1dc7-3dbd-4c8d-8889-f434570e6da2@github.com> Message-ID: On Sun, 11 Oct 2020 15:07:54 GMT, Andrew Haley wrote: > I've often seen ((a & 0xff) << 8) + (b & 0xff) Good idea. I will add AddI -> OrI transformation for the case. ------------- PR: https://git.openjdk.java.net/jdk/pull/511 From shade at openjdk.java.net Tue Oct 20 07:00:24 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 20 Oct 2020 07:00:24 GMT Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: On Fri, 16 Oct 2020 17:31:58 GMT, Eric Caspole wrote: > We have seen some specific benefits to increasing InlineSmallCode to 2500 from 2000, and across the whole promo build > perf test collection the change is neutral to slightly positive, where the tests are run on modern OCI systems. > Passed tier1 testing, some ad-hoc perf testing and more compiler related parts of the weekly promo performance set. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8254913 > Thanks, > Eric That is OK, looks good then. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/705 From roland at openjdk.java.net Tue Oct 20 07:11:30 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 20 Oct 2020 07:11:30 GMT Subject: RFR: 8254998: C2: assert(!n->as_Loop()->is_transformed_long_loop()) failure with -XX:StressLongCountedLoop=1 [v2] In-Reply-To: References: Message-ID: > Transformation of a long counted loop into a loop nest with a counted > loop inner loop happens in 2 passes of loop opts: first the loop nest > is created and then the inner loop is transformed into a counted > loop. The assert fires because the second step doesn't have a chance > to run given the first step was executed as the last loop opts pass > before running out of allowed loop opts passes. The fix I propose for > this corner case is to relax the assert to account for this. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: test fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/735/files - new: https://git.openjdk.java.net/jdk/pull/735/files/3e6d0269..913cd877 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=735&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=735&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/735.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/735/head:pull/735 PR: https://git.openjdk.java.net/jdk/pull/735 From roland at openjdk.java.net Tue Oct 20 07:14:19 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 20 Oct 2020 07:14:19 GMT Subject: RFR: 8254998: C2: assert(!n->as_Loop()->is_transformed_long_loop()) failure with -XX:StressLongCountedLoop=1 [v2] In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 05:45:10 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> test fix > > test/hotspot/jtreg/compiler/longcountedloops/TestTooManyLoopOpts.java line 26: > >> 24: /** >> 25: * @test >> 26: * @bug JDK-8254998 > > I didn't know that the bug ID format with "JDK-" is accepted as well but I would suggest to remove it for consistency > with other tests. I didn't include "JDK-" on purpose. I've just fixed it. ------------- PR: https://git.openjdk.java.net/jdk/pull/735 From stuefe at openjdk.java.net Tue Oct 20 07:19:30 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 20 Oct 2020 07:19:30 GMT Subject: RFR: 8254966: Remove unused code from Matcher In-Reply-To: References: Message-ID: On Sun, 18 Oct 2020 12:11:41 GMT, Claes Redestad wrote: > These are unused: > > interpreter_method_reg* > compiler_method_reg* > pd_implicit_null_fixup (only existed to workaround some Win95 issues) > > Quite some code in the .ad files shook loose when removing these, so I will need some help verifying that the s390 and > ppc changes are fine. Not a full review. Tested on s390, ppc64. Builds fine, tests show no errors. Changes looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/723 From thartmann at openjdk.java.net Tue Oct 20 07:21:39 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 20 Oct 2020 07:21:39 GMT Subject: RFR: 8254998: C2: assert(!n->as_Loop()->is_transformed_long_loop()) failure with -XX:StressLongCountedLoop=1 [v2] In-Reply-To: References: Message-ID: <_eG2HLR8vKzdW_8O5RTpcuQVH2ukJcdQnerDv0MgRNM=.d6f379db-a34b-4d31-b316-6c1251c890ac@github.com> On Tue, 20 Oct 2020 07:11:30 GMT, Roland Westrelin wrote: >> Transformation of a long counted loop into a loop nest with a counted >> loop inner loop happens in 2 passes of loop opts: first the loop nest >> is created and then the inner loop is transformed into a counted >> loop. The assert fires because the second step doesn't have a chance >> to run given the first step was executed as the last loop opts pass >> before running out of allowed loop opts passes. The fix I propose for >> this corner case is to relax the assert to account for this. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > test fix Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/735 From roland at openjdk.java.net Tue Oct 20 07:25:23 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 20 Oct 2020 07:25:23 GMT Subject: RFR: 8254998: C2: assert(!n->as_Loop()->is_transformed_long_loop()) failure with -XX:StressLongCountedLoop=1 [v2] In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 17:25:58 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> test fix > > Looks good. @iwanowww @vnkozlov @TobiHartmann thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/735 From njian at openjdk.java.net Tue Oct 20 07:42:36 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Tue, 20 Oct 2020 07:42:36 GMT Subject: RFR: 8254884: Make sure jvm does not crash with Arm SVE and Vector API [v3] In-Reply-To: References: Message-ID: <8YkOZxIkMzhxzWM2QOyYDZ6YDgJZEkqk69bsRkDqjDo=.b167c79c-ca3c-4537-936c-22f619317191@github.com> > Currently we have not implemented all Arm SVE code generation for Vector API specific nodes. To make sure hotspot does > not crash with bad AD file (as NEON has implemented them), we simply add those OPs to unsupported op list. > This is the port and minor cleanup of JDK-8253211 in repo-panama: https://github.com/openjdk/panama-vector/pull/7 with > Op_VectorUnbox (not for codegen) and Op_VectorMaskWrapper (actually unused node. dead code?) removed from the > unsupported op list and Op_VectorLoadConst added. Test: tier1-3 on AArch64 and x86_64 as well as Vector API tests on > AArch64 SVE. Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: Address review comments from Andrew Dinn ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/726/files - new: https://git.openjdk.java.net/jdk/pull/726/files/9bfa188c..a4caf630 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=726&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=726&range=01-02 Stats: 7 lines in 2 files changed: 0 ins; 1 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/726.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/726/head:pull/726 PR: https://git.openjdk.java.net/jdk/pull/726 From njian at openjdk.java.net Tue Oct 20 07:46:16 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Tue, 20 Oct 2020 07:46:16 GMT Subject: RFR: 8254884: Make sure jvm does not crash with Arm SVE and Vector API [v2] In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 13:01:27 GMT, Andrew Dinn wrote: >> Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments from Vladimir. >> >> Change-Id: Ia484dc59f41d265972f1253193d1d7d7cb032a12 > > src/hotspot/cpu/aarch64/aarch64_sve.ad line 897: > >> 895: predicate(UseSVE > 0 && n->in(2)->bottom_type()->is_vect()->length_in_bytes() >= 16 && >> 896: (n->in(2)->bottom_type()->is_vect()->element_basic_type() == T_SHORT || >> 897: (n->in(2)->bottom_type()->is_vect()->element_basic_type() == T_CHAR))); > > Is it appropriate to do this reduction when element_basic_type() == T_CHAR? Fixed. I don't think we have that support in vectorizer and vector api. Also checked NEON and x86. ------------- PR: https://git.openjdk.java.net/jdk/pull/726 From neliasso at openjdk.java.net Tue Oct 20 07:59:11 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 20 Oct 2020 07:59:11 GMT Subject: RFR: 8255026: C2: Miscellaneous cleanups in Compile and PhaseIdealLoop code In-Reply-To: References: Message-ID: <5vQDGKqDxEl1kvG-A0DIRO7YjOmSZzT9HFc-O8jGxm4=.f95427f6-8980-46c0-873c-d67faffd9e94@github.com> On Mon, 19 Oct 2020 19:55:58 GMT, Vladimir Ivanov wrote: > Miscellaneous cleanups in Compile and PhaseIdealLoop code. > > Compile: > > - inline node lists > > - remove empty Compile::register_library_intrinsics() > > - introduce `Compile::remove_useless_nodes(GrowableArray& node_list, Unique_Node_List& useful)` > > > PhaseIdealLoop: > > - unify logging and IGVN logic > > - refactor logging code > > Testing: tier1 - tier5 Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/749 From fyang at openjdk.java.net Tue Oct 20 08:07:24 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Tue, 20 Oct 2020 08:07:24 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v9] In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 20:26:22 GMT, Vladimir Kozlov wrote: > Always run graalunit testing with new intrinsics. You need to adjust Graal test: > src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java Thanks for looking at this. We did run graalunit testing and added the following change in our first commit: diff --git a/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java b/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java index f0e17947460..8f3f4ed9323 100644 --- a/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java +++ b/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java @@ -601,6 +601,7 @@ public class CheckGraalIntrinsics extends GraalTest { if (!config.useSHA512Intrinsics()) { add(ignore, "sun/security/provider/SHA5." + shaCompressName + "([BI)V"); } + add(toBeInvestigated, "sun/security/provider/SHA3." + shaCompressName + "([BI)V"); } ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From davleopo at openjdk.java.net Tue Oct 20 08:19:21 2020 From: davleopo at openjdk.java.net (David Leopoldseder) Date: Tue, 20 Oct 2020 08:19:21 GMT Subject: RFR: 8253964: [Graal] UnschedulableGraphTest#test01fails with expected:<4> but was:<3> Message-ID: <82Qy_vWMGH7UWTXYapYYXN6gQ2Xkyns399Mj7VjBKLU=.983ab385-de53-473f-8a29-69e8cfbd5434@github.com> This PR fixes a Graal unit test failure in the presence of -Xcomp. The assertion in the test fails with -Xcomp as RemoveNeverExecutedCode triggers since we dont have proper profiles with Xcomp there. The fix is already tested and integrated in tip graal https://github.com/oracle/graal/commit/287dbdf63ec3bfcce74e910d66c21dccf8e9cc46 . ------------- Commit messages: - 8253964: [Graal] unschedulable graph test: avoid optimistic opts to run correctly in the presence of -Xcomp. Changes: https://git.openjdk.java.net/jdk/pull/756/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=756&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253964 Stats: 11 lines in 1 file changed: 11 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/756.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/756/head:pull/756 PR: https://git.openjdk.java.net/jdk/pull/756 From dnsimon at openjdk.java.net Tue Oct 20 08:41:14 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 20 Oct 2020 08:41:14 GMT Subject: Integrated: 8254842: [JVMCI] copy thread name when attaching libgraal thread to HotSpot In-Reply-To: References: Message-ID: <55Kigxh_H5zhzsa3YvTHPzoZ0LjfE5kQmOhAUBpm26k=.f8e74f1d-0ec9-4e29-9f7e-555980b010a5@github.com> On Thu, 15 Oct 2020 13:29:39 GMT, Doug Simon wrote: > This PR modifies `HotSpotJVMCIRuntime.attachCurrentThread` when it is called from within libgraal so that the name of > the thread in the libgraal heap is used as the name of the peer thread created in HotSpot. > This useful when viewing output such as `-XX:JVMCITraceLevel=1`. For example, here's sample output without this PR: > JVMCITrace-1[Thread-0]: initializing JVMCI runtime -1 > JVMCITrace-1[Thread-0]: initialized JVMCI runtime -1 > and then with: > JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initializing JVMCI runtime -1 > JVMCITrace-1[LibGraalHotSpotGraalManagementInitialization]: initialized JVMCI runtime -1 This pull request has now been integrated. Changeset: 017d151e Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/017d151e Stats: 20 lines in 3 files changed: 14 ins; 0 del; 6 mod 8254842: [JVMCI] copy thread name when attaching libgraal thread to HotSpot Reviewed-by: kvn, never ------------- PR: https://git.openjdk.java.net/jdk/pull/684 From redestad at openjdk.java.net Tue Oct 20 09:27:14 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 20 Oct 2020 09:27:14 GMT Subject: RFR: 8254966: Remove unused code from Matcher In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 07:16:26 GMT, Thomas Stuefe wrote: >> These are unused: >> >> interpreter_method_reg* >> compiler_method_reg* >> pd_implicit_null_fixup (only existed to workaround some Win95 issues) >> >> Quite some code in the .ad files shook loose when removing these, so I will need some help verifying that the s390 and >> ppc changes are fine. > > Not a full review. > > Tested on s390, ppc64. Builds fine, tests show no errors. Changes looks good to me. @tstuefe: thanks for checking! @vnkozlov @AzeemJiva @neliasso: thanks for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/723 From redestad at openjdk.java.net Tue Oct 20 09:31:17 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 20 Oct 2020 09:31:17 GMT Subject: Integrated: 8254966: Remove unused code from Matcher In-Reply-To: References: Message-ID: On Sun, 18 Oct 2020 12:11:41 GMT, Claes Redestad wrote: > These are unused: > > interpreter_method_reg* > compiler_method_reg* > pd_implicit_null_fixup (only existed to workaround some Win95 issues) > > Quite some code in the .ad files shook loose when removing these, so I will need some help verifying that the s390 and > ppc changes are fine. This pull request has now been integrated. Changeset: 3f9c8a39 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/3f9c8a39 Stats: 273 lines in 11 files changed: 1 ins; 259 del; 13 mod 8254966: Remove unused code from Matcher Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/723 From redestad at openjdk.java.net Tue Oct 20 10:00:18 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 20 Oct 2020 10:00:18 GMT Subject: RFR: 8255038: Adjust adapter_code_size to account for -Xlog:methodhandles in debug builds Message-ID: <19Hi-XHXaOXOKT6ykZmCy0VX_StOn4eT21EUJFgR2JE=.514b4345-9acf-4866-be84-d40148bbc8ab@github.com> https://bugs.openjdk.java.net/browse/JDK-8254955 trimmed the space allocated for MethodHandleAdapterBlob, but my analysis failed to account for added trace code information being generated when running with -Xlog:methodhandles (on top of the extra tracing/debugging when running with -XX:+VerifyMethodHandles et.c.). This caused a failure in tier3 Windows ZGC tests. Adjusting the debug size up a notch ensures we stay within bounds. Testing: tier3 ------------- Commit messages: - Debug builds need a bit more size when -Xlog:methodhandles is used Changes: https://git.openjdk.java.net/jdk/pull/761/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=761&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255038 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/761.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/761/head:pull/761 PR: https://git.openjdk.java.net/jdk/pull/761 From shade at openjdk.java.net Tue Oct 20 10:05:10 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 20 Oct 2020 10:05:10 GMT Subject: RFR: 8255038: Adjust adapter_code_size to account for -Xlog:methodhandles in debug builds In-Reply-To: <19Hi-XHXaOXOKT6ykZmCy0VX_StOn4eT21EUJFgR2JE=.514b4345-9acf-4866-be84-d40148bbc8ab@github.com> References: <19Hi-XHXaOXOKT6ykZmCy0VX_StOn4eT21EUJFgR2JE=.514b4345-9acf-4866-be84-d40148bbc8ab@github.com> Message-ID: On Tue, 20 Oct 2020 09:51:58 GMT, Claes Redestad wrote: > https://bugs.openjdk.java.net/browse/JDK-8254955 trimmed the space allocated for MethodHandleAdapterBlob, but my > analysis failed to account for added trace code information being generated when running with -Xlog:methodhandles (on > top of the extra tracing/debugging when running with -XX:+VerifyMethodHandles et.c.). This caused a failure in tier3 > Windows ZGC tests. Adjusting the debug size up a notch ensures we stay within bounds. > Testing: tier3 Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/761 From dnsimon at openjdk.java.net Tue Oct 20 10:05:13 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 20 Oct 2020 10:05:13 GMT Subject: RFR: 8254827: JVMCI: Enable it for Windows+AArch64 In-Reply-To: References: Message-ID: <5JyhO3Fss6If1zk0PGi5mv7xWvMnEaAqumwpDEfBVnM=.e5416a89-1325-434f-8b2d-7a6dd0d889cb@github.com> On Thu, 15 Oct 2020 15:00:47 GMT, Bernhard Urban-Forster wrote: > Use r18 as allocatable register on Linux only. > > A bootstrap works now (it has been crashing before due to r18 being allocated): > $ > ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version > Bootstrapping JVMCI................................. in 17990 ms (compiled 3330 methods) > openjdk version "16-internal" 2021-03-16 > OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) > > Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot.aarch64/src/jdk/vm/ci/hotspot/aarch64/AArch64HotSpotRegisterConfig.java line 126: > 124: public static final Register metaspaceMethodRegister = r12; > 125: > 126: public static final Register platformRegister = r18; There should be a comment here as "platform register" is rather ambiguous. src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot.aarch64/src/jdk/vm/ci/hotspot/aarch64/AArch64HotSpotRegisterConfig.java line 133: > 131: private static final RegisterArray reservedRegisters = new RegisterArray(rscratch1, rscratch2, threadRegister, > fp, lr, r31, zr, sp); 132: > 133: private static RegisterArray initAllocatable(Architecture arch, boolean reserveForHeapBase, boolean linuxOs) { Instead of `linuxOs`, `canUsePlatformRegister` is a better name. The logic of which OS does what belongs more in AArch64HotSpotJVMCIBackendFactory. ------------- PR: https://git.openjdk.java.net/jdk/pull/685 From shade at openjdk.java.net Tue Oct 20 10:19:21 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 20 Oct 2020 10:19:21 GMT Subject: RFR: 8254785: compiler/graalunit/HotspotTest.java failed with "missing Graal intrinsics for: java/lang/StringLatin1.indexOfChar([BIII)I" Message-ID: <34A5RLFOBxsUm8uZd5w17MT-_SR9fQ6IIUkrl5170uk=.2aed028d-5ee4-4f74-8727-c1d554241e24@github.com> This test is problem-listed, and thus misses new intrinsics completely. The new `StringLatin1.indexOf` intrinsic can just be added to `toBeInvestigated` list in `CheckGraalIntrinsics.java`, instead of problem-listing the entire test and thus missing even more intrinsics. Testing: - [x] Linux x86_64 `compiler/graalunit` tests ------------- Commit messages: - 8254785: compiler/graalunit/HotspotTest.java failed with "missing Graal intrinsics for: java/lang/StringLatin1.indexOfChar([BIII)I" Changes: https://git.openjdk.java.net/jdk/pull/762/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=762&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254785 Stats: 8 lines in 2 files changed: 7 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/762.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/762/head:pull/762 PR: https://git.openjdk.java.net/jdk/pull/762 From adinn at openjdk.java.net Tue Oct 20 10:49:19 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Tue, 20 Oct 2020 10:49:19 GMT Subject: RFR: 8254884: Make sure jvm does not crash with Arm SVE and Vector API [v3] In-Reply-To: <8YkOZxIkMzhxzWM2QOyYDZ6YDgJZEkqk69bsRkDqjDo=.b167c79c-ca3c-4537-936c-22f619317191@github.com> References: <8YkOZxIkMzhxzWM2QOyYDZ6YDgJZEkqk69bsRkDqjDo=.b167c79c-ca3c-4537-936c-22f619317191@github.com> Message-ID: On Tue, 20 Oct 2020 07:42:36 GMT, Ningsheng Jian wrote: >> Currently we have not implemented all Arm SVE code generation for Vector API specific nodes. To make sure hotspot does >> not crash with bad AD file (as NEON has implemented them), we simply add those OPs to unsupported op list. >> This is the port and minor cleanup of JDK-8253211 in repo-panama: https://github.com/openjdk/panama-vector/pull/7 with >> Op_VectorUnbox (not for codegen) and Op_VectorMaskWrapper (actually unused node. dead code?) removed from the >> unsupported op list and Op_VectorLoadConst added. Test: tier1-3 on AArch64 and x86_64 as well as Vector API tests on >> AArch64 SVE. > > Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments from Andrew Dinn Hotspot changes look good. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/726 From roland at openjdk.java.net Tue Oct 20 11:57:24 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 20 Oct 2020 11:57:24 GMT Subject: Integrated: 8254998: C2: assert(!n->as_Loop()->is_transformed_long_loop()) failure with -XX:StressLongCountedLoop=1 In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 12:07:46 GMT, Roland Westrelin wrote: > Transformation of a long counted loop into a loop nest with a counted > loop inner loop happens in 2 passes of loop opts: first the loop nest > is created and then the inner loop is transformed into a counted > loop. The assert fires because the second step doesn't have a chance > to run given the first step was executed as the last loop opts pass > before running out of allowed loop opts passes. The fix I propose for > this corner case is to relax the assert to account for this. This pull request has now been integrated. Changeset: 294e0705 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/294e0705 Stats: 56 lines in 2 files changed: 55 ins; 0 del; 1 mod 8254998: C2: assert(!n->as_Loop()->is_transformed_long_loop()) failure with -XX:StressLongCountedLoop=1 Reviewed-by: vlivanov, kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/735 From redestad at openjdk.java.net Tue Oct 20 12:18:21 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 20 Oct 2020 12:18:21 GMT Subject: RFR: 8255026: C2: Miscellaneous cleanups in Compile and PhaseIdealLoop code In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 19:55:58 GMT, Vladimir Ivanov wrote: > Miscellaneous cleanups in Compile and PhaseIdealLoop code. > > Compile: > > - inline node lists > > - remove empty Compile::register_library_intrinsics() > > - introduce `Compile::remove_useless_nodes(GrowableArray& node_list, Unique_Node_List& useful)` > > > PhaseIdealLoop: > > - unify logging and IGVN logic > > - refactor logging code > > Testing: tier1 - tier5 Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/749 From neliasso at openjdk.java.net Tue Oct 20 12:42:08 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 20 Oct 2020 12:42:08 GMT Subject: RFR: 8255038: Adjust adapter_code_size to account for -Xlog:methodhandles in debug builds In-Reply-To: <19Hi-XHXaOXOKT6ykZmCy0VX_StOn4eT21EUJFgR2JE=.514b4345-9acf-4866-be84-d40148bbc8ab@github.com> References: <19Hi-XHXaOXOKT6ykZmCy0VX_StOn4eT21EUJFgR2JE=.514b4345-9acf-4866-be84-d40148bbc8ab@github.com> Message-ID: On Tue, 20 Oct 2020 09:51:58 GMT, Claes Redestad wrote: > https://bugs.openjdk.java.net/browse/JDK-8254955 trimmed the space allocated for MethodHandleAdapterBlob, but my > analysis failed to account for added trace code information being generated when running with -Xlog:methodhandles (on > top of the extra tracing/debugging when running with -XX:+VerifyMethodHandles et.c.). This caused a failure in tier3 > Windows ZGC tests. Adjusting the debug size up a notch ensures we stay within bounds. > Testing: tier3 Looks good and trivial ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/761 From redestad at openjdk.java.net Tue Oct 20 12:42:09 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 20 Oct 2020 12:42:09 GMT Subject: RFR: 8255038: Adjust adapter_code_size to account for -Xlog:methodhandles in debug builds In-Reply-To: References: <19Hi-XHXaOXOKT6ykZmCy0VX_StOn4eT21EUJFgR2JE=.514b4345-9acf-4866-be84-d40148bbc8ab@github.com> Message-ID: On Tue, 20 Oct 2020 12:38:02 GMT, Nils Eliasson wrote: >> https://bugs.openjdk.java.net/browse/JDK-8254955 trimmed the space allocated for MethodHandleAdapterBlob, but my >> analysis failed to account for added trace code information being generated when running with -Xlog:methodhandles (on >> top of the extra tracing/debugging when running with -XX:+VerifyMethodHandles et.c.). This caused a failure in tier3 >> Windows ZGC tests. Adjusting the debug size up a notch ensures we stay within bounds. >> Testing: tier3 > > Looks good and trivial @shipilev @neliasso: thank you for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/761 From fyang at openjdk.java.net Tue Oct 20 13:42:27 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Tue, 20 Oct 2020 13:42:27 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v10] In-Reply-To: References: Message-ID: > Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com > > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. > Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 > > Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. > For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. > "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. > > We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. > Patch passed jtreg tier1-3 tests with QEMU system emulator. > Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make > sure that there's no regression. > We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java > We measured the performance benefit with an aarch64 cycle-accurate simulator. > Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. > > For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on > real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is > detected. Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Fix trailing whitespace issue reported by jcheck - Merge master - Merge master - Remove unnecessary code changes in vm_version_aarch64.cpp - Merge master - Merge master - Merge master - Merge master - Add sha3 instructions to cpu/aarch64/aarch64-asmtest.py and regenerate the test in assembler_aarch64.cpp:asm_check - Rebase - ... and 3 more: https://git.openjdk.java.net/jdk/compare/cdc8c401...d32c8ad7 ------------- Changes: https://git.openjdk.java.net/jdk/pull/207/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=207&range=09 Stats: 1262 lines in 36 files changed: 1007 ins; 22 del; 233 mod Patch: https://git.openjdk.java.net/jdk/pull/207.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/207/head:pull/207 PR: https://git.openjdk.java.net/jdk/pull/207 From burban at openjdk.java.net Tue Oct 20 13:54:29 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Tue, 20 Oct 2020 13:54:29 GMT Subject: RFR: 8254827: JVMCI: Enable it for Windows+AArch64 [v2] In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 11:03:46 GMT, Magnus Ihse Bursie wrote: >> Bernhard Urban-Forster has updated the pull request incrementally with two additional commits since the last revision: >> >> - rename argument to canUsePlatformRegister >> - comment for platformRegister > > Build changes look good, but you'll need a review on the hotspot part as well. Thank you for your comments @dougxc ?? ------------- PR: https://git.openjdk.java.net/jdk/pull/685 From burban at openjdk.java.net Tue Oct 20 13:54:29 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Tue, 20 Oct 2020 13:54:29 GMT Subject: RFR: 8254827: JVMCI: Enable it for Windows+AArch64 [v2] In-Reply-To: References: Message-ID: > Use r18 as allocatable register on Linux only. > > A bootstrap works now (it has been crashing before due to r18 being allocated): > $ > ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version > Bootstrapping JVMCI................................. in 17990 ms (compiled 3330 methods) > openjdk version "16-internal" 2021-03-16 > OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) > > Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. Bernhard Urban-Forster has updated the pull request incrementally with two additional commits since the last revision: - rename argument to canUsePlatformRegister - comment for platformRegister ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/685/files - new: https://git.openjdk.java.net/jdk/pull/685/files/593dfdd6..28dcf572 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=685&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=685&range=00-01 Stats: 18 lines in 2 files changed: 9 ins; 3 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/685.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/685/head:pull/685 PR: https://git.openjdk.java.net/jdk/pull/685 From redestad at openjdk.java.net Tue Oct 20 14:26:15 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 20 Oct 2020 14:26:15 GMT Subject: Integrated: 8255038: Adjust adapter_code_size to account for -Xlog:methodhandles in debug builds In-Reply-To: <19Hi-XHXaOXOKT6ykZmCy0VX_StOn4eT21EUJFgR2JE=.514b4345-9acf-4866-be84-d40148bbc8ab@github.com> References: <19Hi-XHXaOXOKT6ykZmCy0VX_StOn4eT21EUJFgR2JE=.514b4345-9acf-4866-be84-d40148bbc8ab@github.com> Message-ID: On Tue, 20 Oct 2020 09:51:58 GMT, Claes Redestad wrote: > https://bugs.openjdk.java.net/browse/JDK-8254955 trimmed the space allocated for MethodHandleAdapterBlob, but my > analysis failed to account for added trace code information being generated when running with -Xlog:methodhandles (on > top of the extra tracing/debugging when running with -XX:+VerifyMethodHandles et.c.). This caused a failure in tier3 > Windows ZGC tests. Adjusting the debug size up a notch ensures we stay within bounds. > Testing: tier3 This pull request has now been integrated. Changeset: 76fdd7fc Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/76fdd7fc Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8255038: Adjust adapter_code_size to account for -Xlog:methodhandles in debug builds Reviewed-by: shade, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/761 From burban at openjdk.java.net Tue Oct 20 15:46:36 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Tue, 20 Oct 2020 15:46:36 GMT Subject: RFR: 8254827: JVMCI: Enable it for Windows+AArch64 [v3] In-Reply-To: References: Message-ID: > Use r18 as allocatable register on Linux only. > > A bootstrap works now (it has been crashing before due to r18 being allocated): > $ > ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version > Bootstrapping JVMCI................................. in 17990 ms (compiled 3330 methods) > openjdk version "16-internal" 2021-03-16 > OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) > > Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. Bernhard Urban-Forster has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - add missing precompiled.hpp include - Merge remote-tracking branch 'upstream/master' into 8254827-enable-jvmci-win-aarch64 - rename argument to canUsePlatformRegister - comment for platformRegister - 8254827: JVMCI: Enable it for Windows+AArch64 Use r18 as allocatable register on Linux only. A bootstrap works now (it has been crashing before due to r18 being allocated): ```console $ ./windows-aarch64-server-fastdebug/bin/java.exe -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:+BootstrapJVMCI -version Bootstrapping JVMCI................................. in 17990 ms (compiled 3330 methods) openjdk version "16-internal" 2021-03-16 OpenJDK Runtime Environment (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk) OpenJDK 64-Bit Server VM (fastdebug build 16-internal+0-adhoc.NORTHAMERICAbeurba.openjdk-jdk, mixed mode) ``` Jtreg tests `test/hotspot/jtreg/compiler/jvmci` are passing as well. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/685/files - new: https://git.openjdk.java.net/jdk/pull/685/files/28dcf572..7e6cb739 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=685&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=685&range=01-02 Stats: 29566 lines in 423 files changed: 18920 ins; 8788 del; 1858 mod Patch: https://git.openjdk.java.net/jdk/pull/685.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/685/head:pull/685 PR: https://git.openjdk.java.net/jdk/pull/685 From neliasso at openjdk.java.net Tue Oct 20 16:04:14 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 20 Oct 2020 16:04:14 GMT Subject: RFR: 8255000: C2: Unify IGVN processing when loop opts are over In-Reply-To: <5hxd0p94QlH5wKSuHZOGXBfPxeuFUxTdLtIEH7Lxxxo=.e8fbd981-e866-486e-9e08-0c3376a2c529@github.com> References: <5hxd0p94QlH5wKSuHZOGXBfPxeuFUxTdLtIEH7Lxxxo=.e8fbd981-e866-486e-9e08-0c3376a2c529@github.com> Message-ID: On Mon, 19 Oct 2020 20:27:40 GMT, Vladimir Ivanov wrote: > There is a number of use cases when ideal nodes (either individual or even the whole classes) delay some > transformations until loop optimizations are over. > Unfortunately, there are multiple solutions in the code base with their pros and cons: either a custom per-node class > logic (e.g., for range check dependent `CastII` or `Opaque4` nodes) or `Compile::major_progress() == 0` as a signal > that loop opts are over. I propose to introduce a unified approach to reliably process nodes that require (or may > benefit from) IGVN pass once loop opts are finally over and migrate existing use cases to it. > After some experimentation, I decided not to rely on `Compile::major_progress()` because: > * it's hard to reason about its properties (there are many places where it is adjusted); > * attempts to verify its monotonicity using asserts triggered too many sporadic failures. > > So, I wasn't persuaded that `Compile::major_progress() == 0` is reliable enough and introduced a dedicated flag > (`Compile::post_loop_opts_phase()`) which signals that loop opts are over. > (The patch - 69a93d4 commit - is on top of 8255026 cleanup which is reviewed separately.) > > Testing: tier1-tier5 There is a lot to like in this cleanup. Thanks! Reviewed. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/751 From psandoz at openjdk.java.net Tue Oct 20 16:17:16 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Tue, 20 Oct 2020 16:17:16 GMT Subject: RFR: 8254785: compiler/graalunit/HotspotTest.java failed with "missing Graal intrinsics for: java/lang/StringLatin1.indexOfChar([BIII)I" In-Reply-To: <34A5RLFOBxsUm8uZd5w17MT-_SR9fQ6IIUkrl5170uk=.2aed028d-5ee4-4f74-8727-c1d554241e24@github.com> References: <34A5RLFOBxsUm8uZd5w17MT-_SR9fQ6IIUkrl5170uk=.2aed028d-5ee4-4f74-8727-c1d554241e24@github.com> Message-ID: <_mFXOtMbjdI6MRolq7COr2EjBDneK_3tnUKxr_HptEk=.3dd2c815-77d5-4624-9e70-ff5fd805655a@github.com> On Tue, 20 Oct 2020 10:09:18 GMT, Aleksey Shipilev wrote: > This test is problem-listed, and thus misses new intrinsics completely. The new `StringLatin1.indexOf` intrinsic can > just be added to `toBeInvestigated` list in `CheckGraalIntrinsics.java`, instead of problem-listing the entire test and > thus missing even more intrinsics. Testing: > - [x] Linux x86_64 `compiler/graalunit` tests Marked as reviewed by psandoz (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/762 From iignatyev at openjdk.java.net Tue Oct 20 16:29:12 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 20 Oct 2020 16:29:12 GMT Subject: RFR: 8254785: compiler/graalunit/HotspotTest.java failed with "missing Graal intrinsics for: java/lang/StringLatin1.indexOfChar([BIII)I" In-Reply-To: <34A5RLFOBxsUm8uZd5w17MT-_SR9fQ6IIUkrl5170uk=.2aed028d-5ee4-4f74-8727-c1d554241e24@github.com> References: <34A5RLFOBxsUm8uZd5w17MT-_SR9fQ6IIUkrl5170uk=.2aed028d-5ee4-4f74-8727-c1d554241e24@github.com> Message-ID: On Tue, 20 Oct 2020 10:09:18 GMT, Aleksey Shipilev wrote: > This test is problem-listed, and thus misses new intrinsics completely. The new `StringLatin1.indexOf` intrinsic can > just be added to `toBeInvestigated` list in `CheckGraalIntrinsics.java`, instead of problem-listing the entire test and > thus missing even more intrinsics. Testing: > - [x] Linux x86_64 `compiler/graalunit` tests I assume you also need to upstream that to [oracle/graal](https://github.com/oracle/graal). ------------- Marked as reviewed by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/762 From vladimir.kozlov at oracle.com Tue Oct 20 17:01:12 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 20 Oct 2020 10:01:12 -0700 Subject: Howto replicate failure of 8254790? In-Reply-To: References: <4bd7e9f73ea24ae09f1bb0f1808ce5a7@EX13D46EUB003.ant.amazon.com> <661485ab-7de7-26cb-b2b1-3a4f643125eb@oracle.com> <617f010e-629d-7329-ac72-dce797bf3075@oracle.com> Message-ID: <0a0481a0-5f2d-e594-dd71-aa84103bb4d4@oracle.com> Yes, I saw it too but I was not sure because we never hit the issue with Unicode string index intrinsic. An other thing is we see the failure only on MacOS. I also want someone to decode asm dump I provided in bug to see actual instructions where it happened. Vladimir K On 10/19/20 5:38 PM, Viswanathan, Sandhya wrote: > Hi Jason, > > I think I found the problem looking at the error log from Vladimir Kozlov. In stringL_indexof_char() function, the following snippet is the cause of problem: > > 2807 bind(FOUND_CHAR); > 2808 if (UseAVX >= 2) { > 2809 vpmovmskb(tmp, vec3); > 2810 } else { > 2811 pmovmskb(tmp, vec3); > 2812 } > 2813 bsfl(ch, tmp); > 2814 addl(result, ch); <==== The problem is here > 2815 > 2816 bind(FOUND_SEQ_CHAR); > 2817 subptr(result, str1); > > The line addl(result, ch) should have been addptr(result, ch). > > The same problem exists in the Unicode string index of char intrinsic as well and need to be fixed. > > Hope this helps. > > Best Regards, > Sandhya > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Vladimir Kozlov > Sent: Thursday, October 15, 2020 3:59 PM > To: Tatton, Jason ; David Holmes ; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net > Subject: Re: Howto replicate failure of 8254790? > > Hi Jason, > > I added surrounding instructions dump from hs_err file we have so you can reconstruct x86 assembler from it. > > If you look on si_addr: 0x00000000e41d2718 which case memory map failure, it looks like R8 =0x00000007e41d2700 is an > oop: [B with upper 32-bits zeroed. It seems uppers 32-bits of address were cut. > > But I don't see it can happens in stringL_indexof_char() sub. You correctly used movptr() and addptr() instructions. > > Vladimir K > > On 10/15/20 2:10 PM, Tatton, Jason wrote: >> Thanks Vladimir and David, I have access to a new macbook with an Intel i7-9750H (supports AVX2) so I will try on that. >> >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: 15 October 2020 20:25 >> To: David Holmes ; Tatton, Jason >> ; hotspot-compiler-dev at openjdk.java.net; >> core-libs-dev at openjdk.java.net >> Subject: RE: [EXTERNAL] Howto replicate failure of 8254790? >> >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >> >> >> >> Note, we have old Mac machines in our testing env: >> cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, >> sse4.2, popcnt, lzcnt, tsc, tscinvbit, avx, avx2, aes, erms, clmul, >> bmi1, bmi2, rtm, adx, fma, vzeroupper, clflush, clflushopt >> >> Use -XX:UseAVX=2 >> >> But I was not able reproduce failure on my Skylake Linux machine even with -XX:UseAVX=2. Maybe there are other factors on MacOS. >> >> Regards, >> Vladimir K >> >> On 10/14/20 5:48 PM, David Holmes wrote: >>> Hi Jason, >>> >>> On 15/10/2020 10:42 am, Tatton, Jason wrote: >>>> Hi all, >>>> >>>> >>>> >>>> I am trying to replicate the failure of the tier2 test mentioned in >>>> 8254790 but I am >>>> only seeing it pass under an x86 linux machine. Are there any specific architectural constraints under which this test should be run in order to make it fail? >>> >>> It failed on a Mac, not Linux. >>> >>> Cheers, >>> David >>> >>>> >>>> >>>> I am running the test via: make test TEST="test/jdk/javax/xml/crypto/dsig/GenerationTests.java" >>>> >>>> >>>> >>>> Note that I am running the test against master without the commit: >>>> "8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed" which disables the intrinsic that is causing the test to fail. >>>> >>>> >>>> >>>> Thanks >>>> -- >>>> Jason >>>> From sandhya.viswanathan at intel.com Tue Oct 20 17:27:10 2020 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 20 Oct 2020 17:27:10 +0000 Subject: Howto replicate failure of 8254790? In-Reply-To: <0a0481a0-5f2d-e594-dd71-aa84103bb4d4@oracle.com> References: <4bd7e9f73ea24ae09f1bb0f1808ce5a7@EX13D46EUB003.ant.amazon.com> <661485ab-7de7-26cb-b2b1-3a4f643125eb@oracle.com> <617f010e-629d-7329-ac72-dce797bf3075@oracle.com> <0a0481a0-5f2d-e594-dd71-aa84103bb4d4@oracle.com> Message-ID: Hi Vladimir, I analyzed the instruction dump yesterday to find out where the issue is. I have attached it to the bug report as 8254790.asm: https://bugs.openjdk.java.net/browse/JDK-8254790 The crash is reported at: 100: 450FB64C1810 movzx r9d, byte ptr [r8+rbx*1+0x10] Which is just after the intrinsics and uses the rbx register (containing the index of char from the intrinsic). RBX has the large value 0xfffffff900000008 instead of 8. The length of the string is 34 bytes. The match is found in first 32 bytes at index 8. After doing the 32 bytes with the following instructions: 6b: C5FE6F13 vmovdqu ymm2, ymmword ptr [rbx] 6f: C5ED74D1 vpcmpeqb ymm2, ymm2, ymm1 73: C4E27D17C2 vptest ymm0, ymm2 78: 0F8369000000 jnb 0xe7 The control goes to 0xe7. The code snippet at 0xe7 is: e7: C5FDD7CA vpmovmskb ecx, ymm2 eb: 0FBCC1 bsf eax, ecx ee: 03D8 add ebx, eax f0: 482BDF sub rbx, rdi f3: 0F1F4000 nop dword ptr [rax], eax f7: 413BDB cmp ebx, r11d fa: 0F83DF290000 jnb 0x2adf 100: 450FB64C1810 movzx r9d, byte ptr [r8+rbx*1+0x10] After vpmovmskb, the bit mask in ecx is 0x1100, showing the match at 8th and 9th byte. The register rbx at this point must be holding address to the base of array: 0x00000007e41d2700 same as rdi. Bsf puts 8 in eax. Then 8 is added to ebx instead of rbx using 32-bit add, making upper 32 bits as 0, resulting in rbx = 0xe41d2708. If the add was 64-bit add, everything would have worked well. Then sub rbx, rdi results in 0xe41d2708 - 0x00000007e41d2700 = 0xFFFFFFF900000008 being loaded in rbx. This is the value we see at crash. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov Sent: Tuesday, October 20, 2020 10:01 AM To: Viswanathan, Sandhya ; Tatton, Jason ; David Holmes ; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; Hohensee, Paul Subject: Re: Howto replicate failure of 8254790? Yes, I saw it too but I was not sure because we never hit the issue with Unicode string index intrinsic. An other thing is we see the failure only on MacOS. I also want someone to decode asm dump I provided in bug to see actual instructions where it happened. Vladimir K On 10/19/20 5:38 PM, Viswanathan, Sandhya wrote: > Hi Jason, > > I think I found the problem looking at the error log from Vladimir Kozlov. In stringL_indexof_char() function, the following snippet is the cause of problem: > > 2807 bind(FOUND_CHAR); > 2808 if (UseAVX >= 2) { > 2809 vpmovmskb(tmp, vec3); > 2810 } else { > 2811 pmovmskb(tmp, vec3); > 2812 } > 2813 bsfl(ch, tmp); > 2814 addl(result, ch); <==== The problem is here > 2815 > 2816 bind(FOUND_SEQ_CHAR); > 2817 subptr(result, str1); > > The line addl(result, ch) should have been addptr(result, ch). > > The same problem exists in the Unicode string index of char intrinsic as well and need to be fixed. > > Hope this helps. > > Best Regards, > Sandhya > > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Vladimir > Kozlov > Sent: Thursday, October 15, 2020 3:59 PM > To: Tatton, Jason ; David Holmes > ; hotspot-compiler-dev at openjdk.java.net; > core-libs-dev at openjdk.java.net > Subject: Re: Howto replicate failure of 8254790? > > Hi Jason, > > I added surrounding instructions dump from hs_err file we have so you can reconstruct x86 assembler from it. > > If you look on si_addr: 0x00000000e41d2718 which case memory map > failure, it looks like R8 =0x00000007e41d2700 is an > oop: [B with upper 32-bits zeroed. It seems uppers 32-bits of address were cut. > > But I don't see it can happens in stringL_indexof_char() sub. You correctly used movptr() and addptr() instructions. > > Vladimir K > > On 10/15/20 2:10 PM, Tatton, Jason wrote: >> Thanks Vladimir and David, I have access to a new macbook with an Intel i7-9750H (supports AVX2) so I will try on that. >> >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: 15 October 2020 20:25 >> To: David Holmes ; Tatton, Jason >> ; hotspot-compiler-dev at openjdk.java.net; >> core-libs-dev at openjdk.java.net >> Subject: RE: [EXTERNAL] Howto replicate failure of 8254790? >> >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >> >> >> >> Note, we have old Mac machines in our testing env: >> cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, >> sse4.2, popcnt, lzcnt, tsc, tscinvbit, avx, avx2, aes, erms, clmul, >> bmi1, bmi2, rtm, adx, fma, vzeroupper, clflush, clflushopt >> >> Use -XX:UseAVX=2 >> >> But I was not able reproduce failure on my Skylake Linux machine even with -XX:UseAVX=2. Maybe there are other factors on MacOS. >> >> Regards, >> Vladimir K >> >> On 10/14/20 5:48 PM, David Holmes wrote: >>> Hi Jason, >>> >>> On 15/10/2020 10:42 am, Tatton, Jason wrote: >>>> Hi all, >>>> >>>> >>>> >>>> I am trying to replicate the failure of the tier2 test mentioned in >>>> 8254790 but I am >>>> only seeing it pass under an x86 linux machine. Are there any specific architectural constraints under which this test should be run in order to make it fail? >>> >>> It failed on a Mac, not Linux. >>> >>> Cheers, >>> David >>> >>>> >>>> >>>> I am running the test via: make test TEST="test/jdk/javax/xml/crypto/dsig/GenerationTests.java" >>>> >>>> >>>> >>>> Note that I am running the test against master without the commit: >>>> "8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed" which disables the intrinsic that is causing the test to fail. >>>> >>>> >>>> >>>> Thanks >>>> -- >>>> Jason >>>> From mchung at openjdk.java.net Tue Oct 20 18:19:17 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Tue, 20 Oct 2020 18:19:17 GMT Subject: RFR: 8254975: lambda proxy fails to access a protected member inherited from a split package Message-ID: <8QLuT7jhXUHtB8Ce1Eb15D4LuEZkAEGWPr4k7LetMcI=.bbda143d-7c36-4da9-8e04-f359b19a7aed@github.com> It's a bug in determining if a protected member inherited from a superclass is in a split package as its host class that it only checks on the package name. The fix is simple and compare the runtime package of the lambda class (which is in the same runtime package as the host class) with that of the declaring class of the protected member being accessed. ------------- Commit messages: - Merge branch 'master' of https://github.com/openjdk/jdk into hidden-classes - Fix copyright year - 8254975: lambda proxy fails to access a protected member inherited from a split package Changes: https://git.openjdk.java.net/jdk/pull/767/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=767&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254975 Stats: 86 lines in 2 files changed: 77 ins; 1 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/767.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/767/head:pull/767 PR: https://git.openjdk.java.net/jdk/pull/767 From never at openjdk.java.net Tue Oct 20 18:24:20 2020 From: never at openjdk.java.net (Tom Rodriguez) Date: Tue, 20 Oct 2020 18:24:20 GMT Subject: RFR: 8255068: [JVMCI] errors during compiler creation can be hidden Message-ID: This improves the handling of errors during compiler creation. @dougxc @vnkozlov ------------- Commit messages: - 8255068: [JVMCI] errors during compiler creation can be hidden Changes: https://git.openjdk.java.net/jdk/pull/768/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=768&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255068 Stats: 25 lines in 1 file changed: 23 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/768.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/768/head:pull/768 PR: https://git.openjdk.java.net/jdk/pull/768 From kvn at openjdk.java.net Tue Oct 20 18:33:14 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 20 Oct 2020 18:33:14 GMT Subject: RFR: 8255068: [JVMCI] errors during compiler creation can be hidden In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 18:18:14 GMT, Tom Rodriguez wrote: > This improves the handling of errors during compiler creation. @dougxc @vnkozlov Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/768 From kvn at openjdk.java.net Tue Oct 20 19:21:18 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 20 Oct 2020 19:21:18 GMT Subject: RFR: 8253964: [Graal] UnschedulableGraphTest#test01fails with expected:<4> but was:<3> In-Reply-To: <82Qy_vWMGH7UWTXYapYYXN6gQ2Xkyns399Mj7VjBKLU=.983ab385-de53-473f-8a29-69e8cfbd5434@github.com> References: <82Qy_vWMGH7UWTXYapYYXN6gQ2Xkyns399Mj7VjBKLU=.983ab385-de53-473f-8a29-69e8cfbd5434@github.com> Message-ID: On Tue, 20 Oct 2020 07:37:26 GMT, David Leopoldseder wrote: > This PR fixes a Graal unit test failure in the presence of -Xcomp. The assertion in the test fails with -Xcomp as > RemoveNeverExecutedCode triggers since we dont have proper profiles with Xcomp there. > The fix is already tested and integrated in tip graal > https://github.com/oracle/graal/commit/287dbdf63ec3bfcce74e910d66c21dccf8e9cc46 . Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/756 From vlivanov at openjdk.java.net Tue Oct 20 19:21:26 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 20 Oct 2020 19:21:26 GMT Subject: RFR: 8255067: Restore Copyright line in file modified by 8253191 Message-ID: <1TqKsX8MsupinIh8g6bkCZJXKnR56ziAMJVmNkocOeI=.1b1c94e5-c015-4233-8d20-4ff997c05f89@github.com> Make it crystal clear that 8253191 test case doesn't have anything in common with the original version of the test case for 8204479: * restore original test case * add 8253191 test case as a separate file ------------- Commit messages: - New test case - Restore original test case Changes: https://git.openjdk.java.net/jdk/pull/771/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=771&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255067 Stats: 71 lines in 2 files changed: 24 ins; 19 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/771.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/771/head:pull/771 PR: https://git.openjdk.java.net/jdk/pull/771 From jbhateja at openjdk.java.net Tue Oct 20 19:22:25 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 20 Oct 2020 19:22:25 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v5] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 > bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized > instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag > ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. 4) Based on the > perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types > (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge remote-tracking branch 'upstream' into JDK-8252848 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - Replacing explicit type checks with existing type checking routines - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions. ------------- Changes: https://git.openjdk.java.net/jdk/pull/302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=04 Stats: 518 lines in 23 files changed: 494 ins; 0 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From dnsimon at openjdk.java.net Tue Oct 20 20:27:18 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 20 Oct 2020 20:27:18 GMT Subject: RFR: 8255068: [JVMCI] errors during compiler creation can be hidden In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 18:30:34 GMT, Vladimir Kozlov wrote: >> This improves the handling of errors during compiler creation. @dougxc @vnkozlov > > Good. Good. ------------- PR: https://git.openjdk.java.net/jdk/pull/768 From sviswanathan at openjdk.java.net Tue Oct 20 20:35:16 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 20 Oct 2020 20:35:16 GMT Subject: RFR: 8254790: javax/xml/crypto/dsig/GenerationTests.java failed with SIGSEGV in C2 frame Message-ID: The problem was due to 32 bit arithmetic instruction used for 64-bit address in string_indexof_char and stringL_indexof_char: "addl(result, ch)". This patch replaces the addl instruction with addptr and also enables the stringL_indexof_char intrinsic. ------------- Commit messages: - 8254790: javax/xml/crypto/dsig/GenerationTests.java failed with SIGSEGV in C2 frame Changes: https://git.openjdk.java.net/jdk/pull/772/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=772&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254790 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/772.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/772/head:pull/772 PR: https://git.openjdk.java.net/jdk/pull/772 From dlong at openjdk.java.net Tue Oct 20 20:44:09 2020 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 20 Oct 2020 20:44:09 GMT Subject: RFR: 8253964: [Graal] UnschedulableGraphTest#test01fails with expected:<4> but was:<3> In-Reply-To: <82Qy_vWMGH7UWTXYapYYXN6gQ2Xkyns399Mj7VjBKLU=.983ab385-de53-473f-8a29-69e8cfbd5434@github.com> References: <82Qy_vWMGH7UWTXYapYYXN6gQ2Xkyns399Mj7VjBKLU=.983ab385-de53-473f-8a29-69e8cfbd5434@github.com> Message-ID: On Tue, 20 Oct 2020 07:37:26 GMT, David Leopoldseder wrote: > This PR fixes a Graal unit test failure in the presence of -Xcomp. The assertion in the test fails with -Xcomp as > RemoveNeverExecutedCode triggers since we dont have proper profiles with Xcomp there. > The fix is already tested and integrated in tip graal > https://github.com/oracle/graal/commit/287dbdf63ec3bfcce74e910d66c21dccf8e9cc46 . Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/756 From vladimir.kozlov at oracle.com Tue Oct 20 21:39:52 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 20 Oct 2020 14:39:52 -0700 Subject: Howto replicate failure of 8254790? In-Reply-To: References: <4bd7e9f73ea24ae09f1bb0f1808ce5a7@EX13D46EUB003.ant.amazon.com> <661485ab-7de7-26cb-b2b1-3a4f643125eb@oracle.com> <617f010e-629d-7329-ac72-dce797bf3075@oracle.com> <0a0481a0-5f2d-e594-dd71-aa84103bb4d4@oracle.com> Message-ID: <98c245f6-6f58-4aff-97f0-fa2ef122c3b2@oracle.com> Thank you, Sandhya Very nice analysis. I just finished running dsig/GenerationTests.java test multiply runs (to besure) on our systems and confirmed your proposed fix: bsfl(ch, tmp); + if (UseNewCode) { + addptr(result, ch); + } else { addl(result, ch); + } It always fails with addl() and always passed with addptr(). I will assign bug to me and file PR now. I will also fix Unicode string index instrinsic code. Thanks, Vladimir On 10/20/20 10:27 AM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > I analyzed the instruction dump yesterday to find out where the issue is. I have attached it to the bug report as 8254790.asm: > https://bugs.openjdk.java.net/browse/JDK-8254790 > > The crash is reported at: > 100: 450FB64C1810 movzx r9d, byte ptr [r8+rbx*1+0x10] > > Which is just after the intrinsics and uses the rbx register (containing the index of char from the intrinsic). > > RBX has the large value 0xfffffff900000008 instead of 8. The length of the string is 34 bytes. The match is found in first 32 bytes at index 8. > After doing the 32 bytes with the following instructions: > 6b: C5FE6F13 vmovdqu ymm2, ymmword ptr [rbx] > 6f: C5ED74D1 vpcmpeqb ymm2, ymm2, ymm1 > 73: C4E27D17C2 vptest ymm0, ymm2 > 78: 0F8369000000 jnb 0xe7 > The control goes to 0xe7. > > The code snippet at 0xe7 is: > e7: C5FDD7CA vpmovmskb ecx, ymm2 > eb: 0FBCC1 bsf eax, ecx > ee: 03D8 add ebx, eax > f0: 482BDF sub rbx, rdi > f3: 0F1F4000 nop dword ptr [rax], eax > f7: 413BDB cmp ebx, r11d > fa: 0F83DF290000 jnb 0x2adf > 100: 450FB64C1810 movzx r9d, byte ptr [r8+rbx*1+0x10] > > After vpmovmskb, the bit mask in ecx is 0x1100, showing the match at 8th and 9th byte. > The register rbx at this point must be holding address to the base of array: 0x00000007e41d2700 same as rdi. > Bsf puts 8 in eax. > Then 8 is added to ebx instead of rbx using 32-bit add, making upper 32 bits as 0, resulting in rbx = 0xe41d2708. > If the add was 64-bit add, everything would have worked well. > Then sub rbx, rdi results in 0xe41d2708 - 0x00000007e41d2700 = 0xFFFFFFF900000008 being loaded in rbx. > This is the value we see at crash. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov > Sent: Tuesday, October 20, 2020 10:01 AM > To: Viswanathan, Sandhya ; Tatton, Jason ; David Holmes ; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; Hohensee, Paul > Subject: Re: Howto replicate failure of 8254790? > > Yes, I saw it too but I was not sure because we never hit the issue with Unicode string index intrinsic. > An other thing is we see the failure only on MacOS. > > I also want someone to decode asm dump I provided in bug to see actual instructions where it happened. > > Vladimir K > > On 10/19/20 5:38 PM, Viswanathan, Sandhya wrote: >> Hi Jason, >> >> I think I found the problem looking at the error log from Vladimir Kozlov. In stringL_indexof_char() function, the following snippet is the cause of problem: >> >> 2807 bind(FOUND_CHAR); >> 2808 if (UseAVX >= 2) { >> 2809 vpmovmskb(tmp, vec3); >> 2810 } else { >> 2811 pmovmskb(tmp, vec3); >> 2812 } >> 2813 bsfl(ch, tmp); >> 2814 addl(result, ch); <==== The problem is here >> 2815 >> 2816 bind(FOUND_SEQ_CHAR); >> 2817 subptr(result, str1); >> >> The line addl(result, ch) should have been addptr(result, ch). >> >> The same problem exists in the Unicode string index of char intrinsic as well and need to be fixed. >> >> Hope this helps. >> >> Best Regards, >> Sandhya >> >> -----Original Message----- >> From: hotspot-compiler-dev >> On Behalf Of Vladimir >> Kozlov >> Sent: Thursday, October 15, 2020 3:59 PM >> To: Tatton, Jason ; David Holmes >> ; hotspot-compiler-dev at openjdk.java.net; >> core-libs-dev at openjdk.java.net >> Subject: Re: Howto replicate failure of 8254790? >> >> Hi Jason, >> >> I added surrounding instructions dump from hs_err file we have so you can reconstruct x86 assembler from it. >> >> If you look on si_addr: 0x00000000e41d2718 which case memory map >> failure, it looks like R8 =0x00000007e41d2700 is an >> oop: [B with upper 32-bits zeroed. It seems uppers 32-bits of address were cut. >> >> But I don't see it can happens in stringL_indexof_char() sub. You correctly used movptr() and addptr() instructions. >> >> Vladimir K >> >> On 10/15/20 2:10 PM, Tatton, Jason wrote: >>> Thanks Vladimir and David, I have access to a new macbook with an Intel i7-9750H (supports AVX2) so I will try on that. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: 15 October 2020 20:25 >>> To: David Holmes ; Tatton, Jason >>> ; hotspot-compiler-dev at openjdk.java.net; >>> core-libs-dev at openjdk.java.net >>> Subject: RE: [EXTERNAL] Howto replicate failure of 8254790? >>> >>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >>> >>> >>> >>> Note, we have old Mac machines in our testing env: >>> cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, >>> sse4.2, popcnt, lzcnt, tsc, tscinvbit, avx, avx2, aes, erms, clmul, >>> bmi1, bmi2, rtm, adx, fma, vzeroupper, clflush, clflushopt >>> >>> Use -XX:UseAVX=2 >>> >>> But I was not able reproduce failure on my Skylake Linux machine even with -XX:UseAVX=2. Maybe there are other factors on MacOS. >>> >>> Regards, >>> Vladimir K >>> >>> On 10/14/20 5:48 PM, David Holmes wrote: >>>> Hi Jason, >>>> >>>> On 15/10/2020 10:42 am, Tatton, Jason wrote: >>>>> Hi all, >>>>> >>>>> >>>>> >>>>> I am trying to replicate the failure of the tier2 test mentioned in >>>>> 8254790 but I am >>>>> only seeing it pass under an x86 linux machine. Are there any specific architectural constraints under which this test should be run in order to make it fail? >>>> >>>> It failed on a Mac, not Linux. >>>> >>>> Cheers, >>>> David >>>> >>>>> >>>>> >>>>> I am running the test via: make test TEST="test/jdk/javax/xml/crypto/dsig/GenerationTests.java" >>>>> >>>>> >>>>> >>>>> Note that I am running the test against master without the commit: >>>>> "8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed" which disables the intrinsic that is causing the test to fail. >>>>> >>>>> >>>>> >>>>> Thanks >>>>> -- >>>>> Jason >>>>> From sandhya.viswanathan at intel.com Tue Oct 20 21:44:10 2020 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 20 Oct 2020 21:44:10 +0000 Subject: Howto replicate failure of 8254790? In-Reply-To: <98c245f6-6f58-4aff-97f0-fa2ef122c3b2@oracle.com> References: <4bd7e9f73ea24ae09f1bb0f1808ce5a7@EX13D46EUB003.ant.amazon.com> <661485ab-7de7-26cb-b2b1-3a4f643125eb@oracle.com> <617f010e-629d-7329-ac72-dce797bf3075@oracle.com> <0a0481a0-5f2d-e594-dd71-aa84103bb4d4@oracle.com> <98c245f6-6f58-4aff-97f0-fa2ef122c3b2@oracle.com> Message-ID: Hi Vladimir, I submitted a pull request an hour or so ago as this was a P1 bug, feel free to use that or ignore. https://git.openjdk.java.net/jdk/pull/772 Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov Sent: Tuesday, October 20, 2020 2:40 PM To: Viswanathan, Sandhya ; Tatton, Jason ; David Holmes ; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; Hohensee, Paul Subject: Re: Howto replicate failure of 8254790? Thank you, Sandhya Very nice analysis. I just finished running dsig/GenerationTests.java test multiply runs (to besure) on our systems and confirmed your proposed fix: bsfl(ch, tmp); + if (UseNewCode) { + addptr(result, ch); + } else { addl(result, ch); + } It always fails with addl() and always passed with addptr(). I will assign bug to me and file PR now. I will also fix Unicode string index instrinsic code. Thanks, Vladimir On 10/20/20 10:27 AM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > I analyzed the instruction dump yesterday to find out where the issue is. I have attached it to the bug report as 8254790.asm: > https://bugs.openjdk.java.net/browse/JDK-8254790 > > The crash is reported at: > 100: 450FB64C1810 movzx r9d, byte ptr [r8+rbx*1+0x10] > > Which is just after the intrinsics and uses the rbx register (containing the index of char from the intrinsic). > > RBX has the large value 0xfffffff900000008 instead of 8. The length of the string is 34 bytes. The match is found in first 32 bytes at index 8. > After doing the 32 bytes with the following instructions: > 6b: C5FE6F13 vmovdqu ymm2, ymmword ptr [rbx] > 6f: C5ED74D1 vpcmpeqb ymm2, ymm2, ymm1 > 73: C4E27D17C2 vptest ymm0, ymm2 > 78: 0F8369000000 jnb 0xe7 > The control goes to 0xe7. > > The code snippet at 0xe7 is: > e7: C5FDD7CA vpmovmskb ecx, ymm2 > eb: 0FBCC1 bsf eax, ecx > ee: 03D8 add ebx, eax > f0: 482BDF sub rbx, rdi > f3: 0F1F4000 nop dword ptr [rax], eax > f7: 413BDB cmp ebx, r11d > fa: 0F83DF290000 jnb 0x2adf > 100: 450FB64C1810 movzx r9d, byte ptr [r8+rbx*1+0x10] > > After vpmovmskb, the bit mask in ecx is 0x1100, showing the match at 8th and 9th byte. > The register rbx at this point must be holding address to the base of array: 0x00000007e41d2700 same as rdi. > Bsf puts 8 in eax. > Then 8 is added to ebx instead of rbx using 32-bit add, making upper 32 bits as 0, resulting in rbx = 0xe41d2708. > If the add was 64-bit add, everything would have worked well. > Then sub rbx, rdi results in 0xe41d2708 - 0x00000007e41d2700 = 0xFFFFFFF900000008 being loaded in rbx. > This is the value we see at crash. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov > Sent: Tuesday, October 20, 2020 10:01 AM > To: Viswanathan, Sandhya ; Tatton, > Jason ; David Holmes ; > hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; > Hohensee, Paul > Subject: Re: Howto replicate failure of 8254790? > > Yes, I saw it too but I was not sure because we never hit the issue with Unicode string index intrinsic. > An other thing is we see the failure only on MacOS. > > I also want someone to decode asm dump I provided in bug to see actual instructions where it happened. > > Vladimir K > > On 10/19/20 5:38 PM, Viswanathan, Sandhya wrote: >> Hi Jason, >> >> I think I found the problem looking at the error log from Vladimir Kozlov. In stringL_indexof_char() function, the following snippet is the cause of problem: >> >> 2807 bind(FOUND_CHAR); >> 2808 if (UseAVX >= 2) { >> 2809 vpmovmskb(tmp, vec3); >> 2810 } else { >> 2811 pmovmskb(tmp, vec3); >> 2812 } >> 2813 bsfl(ch, tmp); >> 2814 addl(result, ch); <==== The problem is here >> 2815 >> 2816 bind(FOUND_SEQ_CHAR); >> 2817 subptr(result, str1); >> >> The line addl(result, ch) should have been addptr(result, ch). >> >> The same problem exists in the Unicode string index of char intrinsic as well and need to be fixed. >> >> Hope this helps. >> >> Best Regards, >> Sandhya >> >> -----Original Message----- >> From: hotspot-compiler-dev >> On Behalf Of Vladimir >> Kozlov >> Sent: Thursday, October 15, 2020 3:59 PM >> To: Tatton, Jason ; David Holmes >> ; hotspot-compiler-dev at openjdk.java.net; >> core-libs-dev at openjdk.java.net >> Subject: Re: Howto replicate failure of 8254790? >> >> Hi Jason, >> >> I added surrounding instructions dump from hs_err file we have so you can reconstruct x86 assembler from it. >> >> If you look on si_addr: 0x00000000e41d2718 which case memory map >> failure, it looks like R8 =0x00000007e41d2700 is an >> oop: [B with upper 32-bits zeroed. It seems uppers 32-bits of address were cut. >> >> But I don't see it can happens in stringL_indexof_char() sub. You correctly used movptr() and addptr() instructions. >> >> Vladimir K >> >> On 10/15/20 2:10 PM, Tatton, Jason wrote: >>> Thanks Vladimir and David, I have access to a new macbook with an Intel i7-9750H (supports AVX2) so I will try on that. >>> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: 15 October 2020 20:25 >>> To: David Holmes ; Tatton, Jason >>> ; hotspot-compiler-dev at openjdk.java.net; >>> core-libs-dev at openjdk.java.net >>> Subject: RE: [EXTERNAL] Howto replicate failure of 8254790? >>> >>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >>> >>> >>> >>> Note, we have old Mac machines in our testing env: >>> cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, >>> sse4.2, popcnt, lzcnt, tsc, tscinvbit, avx, avx2, aes, erms, clmul, >>> bmi1, bmi2, rtm, adx, fma, vzeroupper, clflush, clflushopt >>> >>> Use -XX:UseAVX=2 >>> >>> But I was not able reproduce failure on my Skylake Linux machine even with -XX:UseAVX=2. Maybe there are other factors on MacOS. >>> >>> Regards, >>> Vladimir K >>> >>> On 10/14/20 5:48 PM, David Holmes wrote: >>>> Hi Jason, >>>> >>>> On 15/10/2020 10:42 am, Tatton, Jason wrote: >>>>> Hi all, >>>>> >>>>> >>>>> >>>>> I am trying to replicate the failure of the tier2 test mentioned >>>>> in 8254790 but I >>>>> am only seeing it pass under an x86 linux machine. Are there any specific architectural constraints under which this test should be run in order to make it fail? >>>> >>>> It failed on a Mac, not Linux. >>>> >>>> Cheers, >>>> David >>>> >>>>> >>>>> >>>>> I am running the test via: make test TEST="test/jdk/javax/xml/crypto/dsig/GenerationTests.java" >>>>> >>>>> >>>>> >>>>> Note that I am running the test against master without the commit: >>>>> "8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed" which disables the intrinsic that is causing the test to fail. >>>>> >>>>> >>>>> >>>>> Thanks >>>>> -- >>>>> Jason >>>>> From kvn at openjdk.java.net Tue Oct 20 21:55:12 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 20 Oct 2020 21:55:12 GMT Subject: RFR: 8254790: javax/xml/crypto/dsig/GenerationTests.java failed with SIGSEGV in C2 frame In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 20:23:23 GMT, Sandhya Viswanathan wrote: > The problem was due to 32 bit arithmetic instruction used for 64-bit address in > string_indexof_char and stringL_indexof_char: "addl(result, ch)". > > This patch replaces the addl instruction with addptr and also enables the stringL_indexof_char intrinsic. Looks good. Let me finish testing before integration. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/772 From vladimir.kozlov at oracle.com Tue Oct 20 21:55:40 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 20 Oct 2020 14:55:40 -0700 Subject: Howto replicate failure of 8254790? In-Reply-To: References: <4bd7e9f73ea24ae09f1bb0f1808ce5a7@EX13D46EUB003.ant.amazon.com> <661485ab-7de7-26cb-b2b1-3a4f643125eb@oracle.com> <617f010e-629d-7329-ac72-dce797bf3075@oracle.com> <0a0481a0-5f2d-e594-dd71-aa84103bb4d4@oracle.com> <98c245f6-6f58-4aff-97f0-fa2ef122c3b2@oracle.com> Message-ID: <8c90a8e9-3901-c58f-3c8a-069ade732deb@oracle.com> Perfect - exactly as my local fix. I assigned bug to you and will do review of your PR. I am running tier1-3 testing and let you know results. Please, wait before integration. Thanks, Vladimir K On 10/20/20 2:44 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > I submitted a pull request an hour or so ago as this was a P1 bug, feel free to use that or ignore. > https://git.openjdk.java.net/jdk/pull/772 > > Best Regards, > Sandhya > > -----Original Message----- > From: Vladimir Kozlov > Sent: Tuesday, October 20, 2020 2:40 PM > To: Viswanathan, Sandhya ; Tatton, Jason ; David Holmes ; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; Hohensee, Paul > Subject: Re: Howto replicate failure of 8254790? > > Thank you, Sandhya > > Very nice analysis. > > I just finished running dsig/GenerationTests.java test multiply runs (to besure) on our systems and confirmed your proposed fix: > > bsfl(ch, tmp); > + if (UseNewCode) { > + addptr(result, ch); > + } else { > addl(result, ch); > + } > > It always fails with addl() and always passed with addptr(). I will assign bug to me and file PR now. > I will also fix Unicode string index instrinsic code. > > Thanks, > Vladimir > > > On 10/20/20 10:27 AM, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> I analyzed the instruction dump yesterday to find out where the issue is. I have attached it to the bug report as 8254790.asm: >> https://bugs.openjdk.java.net/browse/JDK-8254790 >> >> The crash is reported at: >> 100: 450FB64C1810 movzx r9d, byte ptr [r8+rbx*1+0x10] >> >> Which is just after the intrinsics and uses the rbx register (containing the index of char from the intrinsic). >> >> RBX has the large value 0xfffffff900000008 instead of 8. The length of the string is 34 bytes. The match is found in first 32 bytes at index 8. >> After doing the 32 bytes with the following instructions: >> 6b: C5FE6F13 vmovdqu ymm2, ymmword ptr [rbx] >> 6f: C5ED74D1 vpcmpeqb ymm2, ymm2, ymm1 >> 73: C4E27D17C2 vptest ymm0, ymm2 >> 78: 0F8369000000 jnb 0xe7 >> The control goes to 0xe7. >> >> The code snippet at 0xe7 is: >> e7: C5FDD7CA vpmovmskb ecx, ymm2 >> eb: 0FBCC1 bsf eax, ecx >> ee: 03D8 add ebx, eax >> f0: 482BDF sub rbx, rdi >> f3: 0F1F4000 nop dword ptr [rax], eax >> f7: 413BDB cmp ebx, r11d >> fa: 0F83DF290000 jnb 0x2adf >> 100: 450FB64C1810 movzx r9d, byte ptr [r8+rbx*1+0x10] >> >> After vpmovmskb, the bit mask in ecx is 0x1100, showing the match at 8th and 9th byte. >> The register rbx at this point must be holding address to the base of array: 0x00000007e41d2700 same as rdi. >> Bsf puts 8 in eax. >> Then 8 is added to ebx instead of rbx using 32-bit add, making upper 32 bits as 0, resulting in rbx = 0xe41d2708. >> If the add was 64-bit add, everything would have worked well. >> Then sub rbx, rdi results in 0xe41d2708 - 0x00000007e41d2700 = 0xFFFFFFF900000008 being loaded in rbx. >> This is the value we see at crash. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Kozlov >> Sent: Tuesday, October 20, 2020 10:01 AM >> To: Viswanathan, Sandhya ; Tatton, >> Jason ; David Holmes ; >> hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; >> Hohensee, Paul >> Subject: Re: Howto replicate failure of 8254790? >> >> Yes, I saw it too but I was not sure because we never hit the issue with Unicode string index intrinsic. >> An other thing is we see the failure only on MacOS. >> >> I also want someone to decode asm dump I provided in bug to see actual instructions where it happened. >> >> Vladimir K >> >> On 10/19/20 5:38 PM, Viswanathan, Sandhya wrote: >>> Hi Jason, >>> >>> I think I found the problem looking at the error log from Vladimir Kozlov. In stringL_indexof_char() function, the following snippet is the cause of problem: >>> >>> 2807 bind(FOUND_CHAR); >>> 2808 if (UseAVX >= 2) { >>> 2809 vpmovmskb(tmp, vec3); >>> 2810 } else { >>> 2811 pmovmskb(tmp, vec3); >>> 2812 } >>> 2813 bsfl(ch, tmp); >>> 2814 addl(result, ch); <==== The problem is here >>> 2815 >>> 2816 bind(FOUND_SEQ_CHAR); >>> 2817 subptr(result, str1); >>> >>> The line addl(result, ch) should have been addptr(result, ch). >>> >>> The same problem exists in the Unicode string index of char intrinsic as well and need to be fixed. >>> >>> Hope this helps. >>> >>> Best Regards, >>> Sandhya >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> On Behalf Of Vladimir >>> Kozlov >>> Sent: Thursday, October 15, 2020 3:59 PM >>> To: Tatton, Jason ; David Holmes >>> ; hotspot-compiler-dev at openjdk.java.net; >>> core-libs-dev at openjdk.java.net >>> Subject: Re: Howto replicate failure of 8254790? >>> >>> Hi Jason, >>> >>> I added surrounding instructions dump from hs_err file we have so you can reconstruct x86 assembler from it. >>> >>> If you look on si_addr: 0x00000000e41d2718 which case memory map >>> failure, it looks like R8 =0x00000007e41d2700 is an >>> oop: [B with upper 32-bits zeroed. It seems uppers 32-bits of address were cut. >>> >>> But I don't see it can happens in stringL_indexof_char() sub. You correctly used movptr() and addptr() instructions. >>> >>> Vladimir K >>> >>> On 10/15/20 2:10 PM, Tatton, Jason wrote: >>>> Thanks Vladimir and David, I have access to a new macbook with an Intel i7-9750H (supports AVX2) so I will try on that. >>>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov >>>> Sent: 15 October 2020 20:25 >>>> To: David Holmes ; Tatton, Jason >>>> ; hotspot-compiler-dev at openjdk.java.net; >>>> core-libs-dev at openjdk.java.net >>>> Subject: RE: [EXTERNAL] Howto replicate failure of 8254790? >>>> >>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >>>> >>>> >>>> >>>> Note, we have old Mac machines in our testing env: >>>> cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, >>>> sse4.2, popcnt, lzcnt, tsc, tscinvbit, avx, avx2, aes, erms, clmul, >>>> bmi1, bmi2, rtm, adx, fma, vzeroupper, clflush, clflushopt >>>> >>>> Use -XX:UseAVX=2 >>>> >>>> But I was not able reproduce failure on my Skylake Linux machine even with -XX:UseAVX=2. Maybe there are other factors on MacOS. >>>> >>>> Regards, >>>> Vladimir K >>>> >>>> On 10/14/20 5:48 PM, David Holmes wrote: >>>>> Hi Jason, >>>>> >>>>> On 15/10/2020 10:42 am, Tatton, Jason wrote: >>>>>> Hi all, >>>>>> >>>>>> >>>>>> >>>>>> I am trying to replicate the failure of the tier2 test mentioned >>>>>> in 8254790 but I >>>>>> am only seeing it pass under an x86 linux machine. Are there any specific architectural constraints under which this test should be run in order to make it fail? >>>>> >>>>> It failed on a Mac, not Linux. >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>>> >>>>>> >>>>>> I am running the test via: make test TEST="test/jdk/javax/xml/crypto/dsig/GenerationTests.java" >>>>>> >>>>>> >>>>>> >>>>>> Note that I am running the test against master without the commit: >>>>>> "8254792: Disable intrinsic StringLatin1.indexOf until 8254790 is fixed" which disables the intrinsic that is causing the test to fail. >>>>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> -- >>>>>> Jason >>>>>> From kvn at openjdk.java.net Tue Oct 20 22:14:15 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 20 Oct 2020 22:14:15 GMT Subject: RFR: 8255067: Restore Copyright line in file modified by 8253191 In-Reply-To: <1TqKsX8MsupinIh8g6bkCZJXKnR56ziAMJVmNkocOeI=.1b1c94e5-c015-4233-8d20-4ff997c05f89@github.com> References: <1TqKsX8MsupinIh8g6bkCZJXKnR56ziAMJVmNkocOeI=.1b1c94e5-c015-4233-8d20-4ff997c05f89@github.com> Message-ID: On Tue, 20 Oct 2020 19:10:17 GMT, Vladimir Ivanov wrote: > Make it crystal clear that 8253191 test case doesn't have anything in common with the original version of the test > case for 8204479: > * restore original test case > * add 8253191 test case as a separate file Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/771 From kvn at openjdk.java.net Tue Oct 20 22:18:10 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 20 Oct 2020 22:18:10 GMT Subject: RFR: 8254785: compiler/graalunit/HotspotTest.java failed with "missing Graal intrinsics for: java/lang/StringLatin1.indexOfChar([BIII)I" In-Reply-To: <34A5RLFOBxsUm8uZd5w17MT-_SR9fQ6IIUkrl5170uk=.2aed028d-5ee4-4f74-8727-c1d554241e24@github.com> References: <34A5RLFOBxsUm8uZd5w17MT-_SR9fQ6IIUkrl5170uk=.2aed028d-5ee4-4f74-8727-c1d554241e24@github.com> Message-ID: On Tue, 20 Oct 2020 10:09:18 GMT, Aleksey Shipilev wrote: > This test is problem-listed, and thus misses new intrinsics completely. The new `StringLatin1.indexOf` intrinsic can > just be added to `toBeInvestigated` list in `CheckGraalIntrinsics.java`, instead of problem-listing the entire test and > thus missing even more intrinsics. Testing: > - [x] Linux x86_64 `compiler/graalunit` tests Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/762 From kvn at openjdk.java.net Tue Oct 20 23:11:19 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 20 Oct 2020 23:11:19 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v10] In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 13:42:27 GMT, Fei Yang wrote: >> Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com >> >> This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. >> Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 >> >> Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. >> For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. >> "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. >> >> We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. >> Patch passed jtreg tier1-3 tests with QEMU system emulator. >> Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make >> sure that there's no regression. >> We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java >> We measured the performance benefit with an aarch64 cycle-accurate simulator. >> Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. >> >> For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on >> real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is >> detected. > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains > 13 commits: > - Fix trailing whitespace issue reported by jcheck > - Merge master > - Merge master > - Remove unnecessary code changes in vm_version_aarch64.cpp > - Merge master > - Merge master > - Merge master > - Merge master > - Add sha3 instructions to cpu/aarch64/aarch64-asmtest.py and regenerate the test in assembler_aarch64.cpp:asm_check > - Rebase > - ... and 3 more: https://git.openjdk.java.net/jdk/compare/cdc8c401...d32c8ad7 Someone in Oracle have to run tier1-tier3 testing with these changes to make sure nothing is broken. I don't want to repeat 8254790. src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java line 604: > 602: add(ignore, "sun/security/provider/SHA5." + shaCompressName + "([BI)V"); > 603: } > 604: add(toBeInvestigated, "sun/security/provider/SHA3." + shaCompressName + "([BI)V"); This should be under `if (isJDK16OrHigher())` check. Something like this: https://github.com/openjdk/jdk/pull/650/files#diff-d1f378fc1b7fe041309e854d40b3a95a91e63fdecf0ecd9826b7c95eaeba314eR527 You can wait when Aleksey push it and update your changes ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/207 From iveresov at openjdk.java.net Wed Oct 21 00:28:12 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 21 Oct 2020 00:28:12 GMT Subject: RFR: 8255000: C2: Unify IGVN processing when loop opts are over In-Reply-To: <5hxd0p94QlH5wKSuHZOGXBfPxeuFUxTdLtIEH7Lxxxo=.e8fbd981-e866-486e-9e08-0c3376a2c529@github.com> References: <5hxd0p94QlH5wKSuHZOGXBfPxeuFUxTdLtIEH7Lxxxo=.e8fbd981-e866-486e-9e08-0c3376a2c529@github.com> Message-ID: On Mon, 19 Oct 2020 20:27:40 GMT, Vladimir Ivanov wrote: > There is a number of use cases when ideal nodes (either individual or even the whole classes) delay some > transformations until loop optimizations are over. > Unfortunately, there are multiple solutions in the code base with their pros and cons: either a custom per-node class > logic (e.g., for range check dependent `CastII` or `Opaque4` nodes) or `Compile::major_progress() == 0` as a signal > that loop opts are over. I propose to introduce a unified approach to reliably process nodes that require (or may > benefit from) IGVN pass once loop opts are finally over and migrate existing use cases to it. > After some experimentation, I decided not to rely on `Compile::major_progress()` because: > * it's hard to reason about its properties (there are many places where it is adjusted); > * attempts to verify its monotonicity using asserts triggered too many sporadic failures. > > So, I wasn't persuaded that `Compile::major_progress() == 0` is reliable enough and introduced a dedicated flag > (`Compile::post_loop_opts_phase()`) which signals that loop opts are over. > (The patch - 69a93d4 commit - is on top of 8255026 cleanup which is reviewed separately.) > > Testing: tier1-tier5 Marked as reviewed by iveresov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/751 From kvn at openjdk.java.net Wed Oct 21 00:56:16 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 21 Oct 2020 00:56:16 GMT Subject: RFR: 8255000: C2: Unify IGVN processing when loop opts are over In-Reply-To: <5hxd0p94QlH5wKSuHZOGXBfPxeuFUxTdLtIEH7Lxxxo=.e8fbd981-e866-486e-9e08-0c3376a2c529@github.com> References: <5hxd0p94QlH5wKSuHZOGXBfPxeuFUxTdLtIEH7Lxxxo=.e8fbd981-e866-486e-9e08-0c3376a2c529@github.com> Message-ID: On Mon, 19 Oct 2020 20:27:40 GMT, Vladimir Ivanov wrote: > There is a number of use cases when ideal nodes (either individual or even the whole classes) delay some > transformations until loop optimizations are over. > Unfortunately, there are multiple solutions in the code base with their pros and cons: either a custom per-node class > logic (e.g., for range check dependent `CastII` or `Opaque4` nodes) or `Compile::major_progress() == 0` as a signal > that loop opts are over. I propose to introduce a unified approach to reliably process nodes that require (or may > benefit from) IGVN pass once loop opts are finally over and migrate existing use cases to it. > After some experimentation, I decided not to rely on `Compile::major_progress()` because: > * it's hard to reason about its properties (there are many places where it is adjusted); > * attempts to verify its monotonicity using asserts triggered too many sporadic failures. > > So, I wasn't persuaded that `Compile::major_progress() == 0` is reliable enough and introduced a dedicated flag > (`Compile::post_loop_opts_phase()`) which signals that loop opts are over. > (The patch - 69a93d4 commit - is on top of 8255026 cleanup which is reviewed separately.) > > Testing: tier1-tier5 Nice clean up. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/751 From njian at openjdk.java.net Wed Oct 21 01:22:10 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Wed, 21 Oct 2020 01:22:10 GMT Subject: RFR: 8254884: Make sure jvm does not crash with Arm SVE and Vector API [v3] In-Reply-To: References: <8YkOZxIkMzhxzWM2QOyYDZ6YDgJZEkqk69bsRkDqjDo=.b167c79c-ca3c-4537-936c-22f619317191@github.com> Message-ID: <62Q9_-ZWsvlxn3KOp_sJKXwNu2o7EJhM6wcP-TMAT0E=.ee6c5d2c-711c-4cae-a86a-23647d9c2318@github.com> On Tue, 20 Oct 2020 10:46:42 GMT, Andrew Dinn wrote: >> Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments from Andrew Dinn > > Hotspot changes look good. Thank you @iwanowww @adinn for the review. My local tests passed with updated patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/726 From njian at openjdk.java.net Wed Oct 21 01:22:11 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Wed, 21 Oct 2020 01:22:11 GMT Subject: Integrated: 8254884: Make sure jvm does not crash with Arm SVE and Vector API In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 08:31:38 GMT, Ningsheng Jian wrote: > Currently we have not implemented all Arm SVE code generation for Vector API specific nodes. To make sure hotspot does > not crash with bad AD file (as NEON has implemented them), we simply add those OPs to unsupported op list. > This is the port and minor cleanup of JDK-8253211 in repo-panama: https://github.com/openjdk/panama-vector/pull/7 with > Op_VectorUnbox (not for codegen) and Op_VectorMaskWrapper (actually unused node. dead code?) removed from the > unsupported op list and Op_VectorLoadConst added. Test: tier1-3 on AArch64 and x86_64 as well as Vector API tests on > AArch64 SVE. This pull request has now been integrated. Changeset: 42a6eadb Author: Ningsheng Jian URL: https://git.openjdk.java.net/jdk/commit/42a6eadb Stats: 132 lines in 4 files changed: 116 ins; 0 del; 16 mod 8254884: Make sure jvm does not crash with Arm SVE and Vector API Reviewed-by: vlivanov, adinn ------------- PR: https://git.openjdk.java.net/jdk/pull/726 From kvn at openjdk.java.net Wed Oct 21 03:41:12 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 21 Oct 2020 03:41:12 GMT Subject: RFR: 8254790: javax/xml/crypto/dsig/GenerationTests.java failed with SIGSEGV in C2 frame In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 21:52:30 GMT, Vladimir Kozlov wrote: >> The problem was due to 32 bit arithmetic instruction used for 64-bit address in >> string_indexof_char and stringL_indexof_char: "addl(result, ch)". >> >> This patch replaces the addl instruction with addptr and also enables the stringL_indexof_char intrinsic. > > Looks good. Let me finish testing before integration. tier1-3 testing passed on x64 (all OSs). ------------- PR: https://git.openjdk.java.net/jdk/pull/772 From njian at openjdk.java.net Wed Oct 21 05:39:21 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Wed, 21 Oct 2020 05:39:21 GMT Subject: RFR: 8254670: SVE test uses linux-specific api Message-ID: The SVE JNI test uses linux-specific api in native code, which is invalid with Windows/AArch64 and macOS/AArch64 ports in. Fixed by simply reducing the test to Linux only. ------------- Commit messages: - 8254670: SVE test uses linux-specific api Changes: https://git.openjdk.java.net/jdk/pull/778/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=778&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254670 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/778.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/778/head:pull/778 PR: https://git.openjdk.java.net/jdk/pull/778 From jbhateja at openjdk.java.net Wed Oct 21 06:12:26 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 21 Oct 2020 06:12:26 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v6] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 > bytes. 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized > instruction sequence using AVX-512 masked instructions emitted at the call site. 3) New runtime flag > ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. 4) Based on the > perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types > (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : > [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: JDK-8252848 : Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/302/files - new: https://git.openjdk.java.net/jdk/pull/302/files/3ff64896..a5d6c5de Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=04-05 Stats: 161 lines in 16 files changed: 24 ins; 88 del; 49 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From shade at openjdk.java.net Wed Oct 21 06:16:15 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 21 Oct 2020 06:16:15 GMT Subject: Integrated: 8254785: compiler/graalunit/HotspotTest.java failed with "missing Graal intrinsics for: java/lang/StringLatin1.indexOfChar([BIII)I" In-Reply-To: <34A5RLFOBxsUm8uZd5w17MT-_SR9fQ6IIUkrl5170uk=.2aed028d-5ee4-4f74-8727-c1d554241e24@github.com> References: <34A5RLFOBxsUm8uZd5w17MT-_SR9fQ6IIUkrl5170uk=.2aed028d-5ee4-4f74-8727-c1d554241e24@github.com> Message-ID: <8EZdv6SV4ObnV7gZuh8k05P6JlhQaSK-n8k7jYryb8A=.abf6b1fc-5c4f-42e1-8d37-59fe289e9095@github.com> On Tue, 20 Oct 2020 10:09:18 GMT, Aleksey Shipilev wrote: > This test is problem-listed, and thus misses new intrinsics completely. The new `StringLatin1.indexOf` intrinsic can > just be added to `toBeInvestigated` list in `CheckGraalIntrinsics.java`, instead of problem-listing the entire test and > thus missing even more intrinsics. Testing: > - [x] Linux x86_64 `compiler/graalunit` tests This pull request has now been integrated. Changeset: 2a063350 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/2a063350 Stats: 8 lines in 2 files changed: 7 ins; 1 del; 0 mod 8254785: compiler/graalunit/HotspotTest.java failed with "missing Graal intrinsics for: java/lang/StringLatin1.indexOfChar([BIII)I" Reviewed-by: psandoz, iignatyev, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/762 From thartmann at openjdk.java.net Wed Oct 21 06:23:08 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 21 Oct 2020 06:23:08 GMT Subject: RFR: 8254790: javax/xml/crypto/dsig/GenerationTests.java failed with SIGSEGV in C2 frame In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 20:23:23 GMT, Sandhya Viswanathan wrote: > The problem was due to 32 bit arithmetic instruction used for 64-bit address in > string_indexof_char and stringL_indexof_char: "addl(result, ch)". > > This patch replaces the addl instruction with addptr and also enables the stringL_indexof_char intrinsic. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/772 From shade at openjdk.java.net Wed Oct 21 06:46:11 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 21 Oct 2020 06:46:11 GMT Subject: RFR: 8255067: Restore Copyright line in file modified by 8253191 In-Reply-To: <1TqKsX8MsupinIh8g6bkCZJXKnR56ziAMJVmNkocOeI=.1b1c94e5-c015-4233-8d20-4ff997c05f89@github.com> References: <1TqKsX8MsupinIh8g6bkCZJXKnR56ziAMJVmNkocOeI=.1b1c94e5-c015-4233-8d20-4ff997c05f89@github.com> Message-ID: <_BfHoYVPoznYEOwtGh6ywQKjv3C2wnHZq1WCHSVP3Io=.229bf420-e433-4192-a12a-6a3e542fe5b5@github.com> On Tue, 20 Oct 2020 19:10:17 GMT, Vladimir Ivanov wrote: > Make it crystal clear that 8253191 test case doesn't have anything in common with the original version of the test > case for 8204479: > * restore original test case > * add 8253191 test case as a separate file Looks good, but maybe call the new test `TestUnsignedByteCompare2`? The other test is kinda "first". ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/771 From vladimir.x.ivanov at oracle.com Wed Oct 21 07:10:39 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 21 Oct 2020 10:10:39 +0300 Subject: RFR: 8255067: Restore Copyright line in file modified by 8253191 In-Reply-To: <_BfHoYVPoznYEOwtGh6ywQKjv3C2wnHZq1WCHSVP3Io=.229bf420-e433-4192-a12a-6a3e542fe5b5@github.com> References: <1TqKsX8MsupinIh8g6bkCZJXKnR56ziAMJVmNkocOeI=.1b1c94e5-c015-4233-8d20-4ff997c05f89@github.com> <_BfHoYVPoznYEOwtGh6ywQKjv3C2wnHZq1WCHSVP3Io=.229bf420-e433-4192-a12a-6a3e542fe5b5@github.com> Message-ID: <14f2be4d-1ff5-6f90-ba46-3bd3b68458aa@oracle.com> Thanks for the reviews, Vladimir and Aleksey. >> Make it crystal clear that 8253191 test case doesn't have anything in common with the original version of the test >> case for 8204479: >> * restore original test case >> * add 8253191 test case as a separate file > > Looks good, but maybe call the new test `TestUnsignedByteCompare2`? The other test is kinda "first". It depends on how you number things: sometimes "first" is referred to as 0th ;-) Best regards, Vladimir Ivanov From neliasso at openjdk.java.net Wed Oct 21 07:21:08 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 21 Oct 2020 07:21:08 GMT Subject: RFR: 8254790: javax/xml/crypto/dsig/GenerationTests.java failed with SIGSEGV in C2 frame In-Reply-To: References: Message-ID: On Wed, 21 Oct 2020 06:20:38 GMT, Tobias Hartmann wrote: >> The problem was due to 32 bit arithmetic instruction used for 64-bit address in >> string_indexof_char and stringL_indexof_char: "addl(result, ch)". >> >> This patch replaces the addl instruction with addptr and also enables the stringL_indexof_char intrinsic. > > Looks good to me. Please change the bug title to something that describes the problem. ------------- PR: https://git.openjdk.java.net/jdk/pull/772 From vlivanov at openjdk.java.net Wed Oct 21 07:35:12 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 21 Oct 2020 07:35:12 GMT Subject: RFR: 8255026: C2: Miscellaneous cleanups in Compile and PhaseIdealLoop code In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 12:15:13 GMT, Claes Redestad wrote: >> Miscellaneous cleanups in Compile and PhaseIdealLoop code. >> >> Compile: >> >> - inline node lists >> >> - remove empty Compile::register_library_intrinsics() >> >> - introduce `Compile::remove_useless_nodes(GrowableArray& node_list, Unique_Node_List& useful)` >> >> >> PhaseIdealLoop: >> >> - unify logging and IGVN logic >> >> - refactor logging code >> >> Testing: tier1 - tier5 > > Marked as reviewed by redestad (Reviewer). Thanks for the reviews, Tobias, Nils, and Claes. ------------- PR: https://git.openjdk.java.net/jdk/pull/749 From davleopo at openjdk.java.net Wed Oct 21 07:37:12 2020 From: davleopo at openjdk.java.net (David Leopoldseder) Date: Wed, 21 Oct 2020 07:37:12 GMT Subject: Integrated: 8253964: [Graal] UnschedulableGraphTest#test01fails with expected:<4> but was:<3> In-Reply-To: <82Qy_vWMGH7UWTXYapYYXN6gQ2Xkyns399Mj7VjBKLU=.983ab385-de53-473f-8a29-69e8cfbd5434@github.com> References: <82Qy_vWMGH7UWTXYapYYXN6gQ2Xkyns399Mj7VjBKLU=.983ab385-de53-473f-8a29-69e8cfbd5434@github.com> Message-ID: On Tue, 20 Oct 2020 07:37:26 GMT, David Leopoldseder wrote: > This PR fixes a Graal unit test failure in the presence of -Xcomp. The assertion in the test fails with -Xcomp as > RemoveNeverExecutedCode triggers since we dont have proper profiles with Xcomp there. > The fix is already tested and integrated in tip graal > https://github.com/oracle/graal/commit/287dbdf63ec3bfcce74e910d66c21dccf8e9cc46 . This pull request has now been integrated. Changeset: c107178b Author: David Leopoldseder Committer: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/c107178b Stats: 11 lines in 1 file changed: 11 ins; 0 del; 0 mod 8253964: [Graal] UnschedulableGraphTest#test01fails with expected:<4> but was:<3> Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/756 From vlivanov at openjdk.java.net Wed Oct 21 07:41:18 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 21 Oct 2020 07:41:18 GMT Subject: Integrated: 8255026: C2: Miscellaneous cleanups in Compile and PhaseIdealLoop code In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 19:55:58 GMT, Vladimir Ivanov wrote: > Miscellaneous cleanups in Compile and PhaseIdealLoop code. > > Compile: > > - inline node lists > > - remove empty Compile::register_library_intrinsics() > > - introduce `Compile::remove_useless_nodes(GrowableArray& node_list, Unique_Node_List& useful)` > > > PhaseIdealLoop: > > - unify logging and IGVN logic > > - refactor logging code > > Testing: tier1 - tier5 This pull request has now been integrated. Changeset: 27230fae Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/27230fae Stats: 189 lines in 6 files changed: 33 ins; 72 del; 84 mod 8255026: C2: Miscellaneous cleanups in Compile and PhaseIdealLoop code Reviewed-by: thartmann, neliasso, redestad ------------- PR: https://git.openjdk.java.net/jdk/pull/749 From vlivanov at openjdk.java.net Wed Oct 21 07:43:10 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 21 Oct 2020 07:43:10 GMT Subject: RFR: 8255000: C2: Unify IGVN processing when loop opts are over In-Reply-To: References: <5hxd0p94QlH5wKSuHZOGXBfPxeuFUxTdLtIEH7Lxxxo=.e8fbd981-e866-486e-9e08-0c3376a2c529@github.com> Message-ID: On Wed, 21 Oct 2020 00:53:01 GMT, Vladimir Kozlov wrote: >> There is a number of use cases when ideal nodes (either individual or even the whole classes) delay some >> transformations until loop optimizations are over. >> Unfortunately, there are multiple solutions in the code base with their pros and cons: either a custom per-node class >> logic (e.g., for range check dependent `CastII` or `Opaque4` nodes) or `Compile::major_progress() == 0` as a signal >> that loop opts are over. I propose to introduce a unified approach to reliably process nodes that require (or may >> benefit from) IGVN pass once loop opts are finally over and migrate existing use cases to it. >> After some experimentation, I decided not to rely on `Compile::major_progress()` because: >> * it's hard to reason about its properties (there are many places where it is adjusted); >> * attempts to verify its monotonicity using asserts triggered too many sporadic failures. >> >> So, I wasn't persuaded that `Compile::major_progress() == 0` is reliable enough and introduced a dedicated flag >> (`Compile::post_loop_opts_phase()`) which signals that loop opts are over. >> (The patch - 69a93d4 commit - is on top of 8255026 cleanup which is reviewed separately.) >> >> Testing: tier1-tier5 > > Nice clean up. Thanks for the reviews, Nils, Igor, and Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/751 From vlivanov at openjdk.java.net Wed Oct 21 08:14:30 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 21 Oct 2020 08:14:30 GMT Subject: RFR: 8255000: C2: Unify IGVN processing when loop opts are over [v2] In-Reply-To: <5hxd0p94QlH5wKSuHZOGXBfPxeuFUxTdLtIEH7Lxxxo=.e8fbd981-e866-486e-9e08-0c3376a2c529@github.com> References: <5hxd0p94QlH5wKSuHZOGXBfPxeuFUxTdLtIEH7Lxxxo=.e8fbd981-e866-486e-9e08-0c3376a2c529@github.com> Message-ID: > There is a number of use cases when ideal nodes (either individual or even the whole classes) delay some > transformations until loop optimizations are over. > Unfortunately, there are multiple solutions in the code base with their pros and cons: either a custom per-node class > logic (e.g., for range check dependent `CastII` or `Opaque4` nodes) or `Compile::major_progress() == 0` as a signal > that loop opts are over. I propose to introduce a unified approach to reliably process nodes that require (or may > benefit from) IGVN pass once loop opts are finally over and migrate existing use cases to it. > After some experimentation, I decided not to rely on `Compile::major_progress()` because: > * it's hard to reason about its properties (there are many places where it is adjusted); > * attempts to verify its monotonicity using asserts triggered too many sporadic failures. > > So, I wasn't persuaded that `Compile::major_progress() == 0` is reliable enough and introduced a dedicated flag > (`Compile::post_loop_opts_phase()`) which signals that loop opts are over. > (The patch - 69a93d4 commit - is on top of 8255026 cleanup which is reviewed separately.) > > Testing: tier1-tier5 Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into 8255000+8255026 - Unify post loop opts IGVN - 8255026: C2: Miscellaneous cleanups in Compile and PhaseIdealLoop code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/751/files - new: https://git.openjdk.java.net/jdk/pull/751/files/69a93d4d..590a6493 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=751&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=751&range=00-01 Stats: 35020 lines in 471 files changed: 24314 ins; 8566 del; 2140 mod Patch: https://git.openjdk.java.net/jdk/pull/751.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/751/head:pull/751 PR: https://git.openjdk.java.net/jdk/pull/751 From vlivanov at openjdk.java.net Wed Oct 21 08:14:31 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 21 Oct 2020 08:14:31 GMT Subject: Integrated: 8255000: C2: Unify IGVN processing when loop opts are over In-Reply-To: <5hxd0p94QlH5wKSuHZOGXBfPxeuFUxTdLtIEH7Lxxxo=.e8fbd981-e866-486e-9e08-0c3376a2c529@github.com> References: <5hxd0p94QlH5wKSuHZOGXBfPxeuFUxTdLtIEH7Lxxxo=.e8fbd981-e866-486e-9e08-0c3376a2c529@github.com> Message-ID: On Mon, 19 Oct 2020 20:27:40 GMT, Vladimir Ivanov wrote: > There is a number of use cases when ideal nodes (either individual or even the whole classes) delay some > transformations until loop optimizations are over. > Unfortunately, there are multiple solutions in the code base with their pros and cons: either a custom per-node class > logic (e.g., for range check dependent `CastII` or `Opaque4` nodes) or `Compile::major_progress() == 0` as a signal > that loop opts are over. I propose to introduce a unified approach to reliably process nodes that require (or may > benefit from) IGVN pass once loop opts are finally over and migrate existing use cases to it. > After some experimentation, I decided not to rely on `Compile::major_progress()` because: > * it's hard to reason about its properties (there are many places where it is adjusted); > * attempts to verify its monotonicity using asserts triggered too many sporadic failures. > > So, I wasn't persuaded that `Compile::major_progress() == 0` is reliable enough and introduced a dedicated flag > (`Compile::post_loop_opts_phase()`) which signals that loop opts are over. > (The patch - 69a93d4 commit - is on top of 8255026 cleanup which is reviewed separately.) > > Testing: tier1-tier5 This pull request has now been integrated. Changeset: 7e264043 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/7e264043 Stats: 274 lines in 13 files changed: 74 ins; 82 del; 118 mod 8255000: C2: Unify IGVN processing when loop opts are over Reviewed-by: neliasso, iveresov, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/751 From jbhateja at openjdk.java.net Wed Oct 21 08:39:49 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 21 Oct 2020 08:39:49 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v4] In-Reply-To: References: Message-ID: <1alTposTHTu9m3Tyy0ero3fHinyJPrPC67z-iAZasF0=.387b7be1-0513-43aa-bd1b-07b9bf933246@github.com> On Tue, 20 Oct 2020 06:24:18 GMT, Tobias Hartmann wrote: >> There is regression after 8252847 changes: 8254890. >> It should be fixed before we proceed with these changes. > > [JDK-8254890](https://bugs.openjdk.java.net/browse/JDK-8254890) is a closed bug because it contains confidential information. I've filed [JDK-8255039](https://bugs.openjdk.java.net/browse/JDK-8255039). > Hi Jatin, > > I'm ready to approve it, but I would like to kick it through some performance testing first. > > Best regards, > Nils Eliasson Hi Nils, I have incorporated your review feedback. Kindly do shared you performance results. Best Regards, Jatin ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From njian at openjdk.java.net Wed Oct 21 08:39:59 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Wed, 21 Oct 2020 08:39:59 GMT Subject: RFR: 8254670: SVE test uses linux-specific api In-Reply-To: References: Message-ID: On Wed, 21 Oct 2020 05:32:20 GMT, Ningsheng Jian wrote: > The SVE JNI test uses linux-specific api in native code, which is invalid with Windows/AArch64 and macOS/AArch64 ports in. Fixed by simply reducing the test to Linux only. Checked the failed test task log and it seems to fail at aot test. I don't know how my trivial patch could be related. And the newly created PR https://github.com/openjdk/jdk/pull/779 failed as well with the same log... ------------- PR: https://git.openjdk.java.net/jdk/pull/778 From fyang at openjdk.java.net Wed Oct 21 09:10:57 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Wed, 21 Oct 2020 09:10:57 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v10] In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 23:06:41 GMT, Vladimir Kozlov wrote: >> Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Fix trailing whitespace issue reported by jcheck >> - Merge master >> - Merge master >> - Remove unnecessary code changes in vm_version_aarch64.cpp >> - Merge master >> - Merge master >> - Merge master >> - Merge master >> - Add sha3 instructions to cpu/aarch64/aarch64-asmtest.py and regenerate the test in assembler_aarch64.cpp:asm_check >> - Rebase >> - ... and 3 more: https://git.openjdk.java.net/jdk/compare/cdc8c401...d32c8ad7 > > src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java line 604: > >> 602: add(ignore, "sun/security/provider/SHA5." + shaCompressName + "([BI)V"); >> 603: } >> 604: add(toBeInvestigated, "sun/security/provider/SHA3." + shaCompressName + "([BI)V"); > > This should be under `if (isJDK16OrHigher())` check. Something like this: > https://github.com/openjdk/jdk/pull/650/files#diff-d1f378fc1b7fe041309e854d40b3a95a91e63fdecf0ecd9826b7c95eaeba314eR527 > You can wait when Aleksey push it and update your changes OK. Will update with the following change after Aleksey's PR is integrated: --- a/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java +++ b/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java @@ -608,6 +608,10 @@ public class CheckGraalIntrinsics extends GraalTest { if (!config.useSHA512Intrinsics()) { add(ignore, "sun/security/provider/SHA5." + shaCompressName + "([BI)V"); } + + if (isJDK16OrHigher()) { + add(toBeInvestigated, "sun/security/provider/SHA3." + shaCompressName + "([BI)V"); + } } ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From fyang at openjdk.java.net Wed Oct 21 09:23:57 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Wed, 21 Oct 2020 09:23:57 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v10] In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 23:08:22 GMT, Vladimir Kozlov wrote: > Someone in Oracle have to run tier1-tier3 testing with these changes to make sure nothing is broken. I don't want to repeat 8254790. That's appreciated. On my side, I run tier1-tier3 both on aarch64 linux and x86_64 linux. The test result on these two platforms looks good for the latest changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From rcastanedalo at openjdk.java.net Wed Oct 21 10:04:14 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 21 Oct 2020 10:04:14 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v2] In-Reply-To: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: > In the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) within `ConvI2LNode::Ideal()`, handle the special case x = y by feeding both inputs of AddL from a single ConvI2L node rather than creating two semantically equivalent ConvI2L nodes. This avoids an exponential number of calls to `ConvI2LNode::Ideal()` when dealing with long chains of AddI nodes. Disable the optimization for the pattern ConvI2L(SubI(x, x)), which is simplified to zero during parsing anyway. Add a set of regression tests for the transformation that cover different shapes of AddI subgraphs. Also add a microbenchmark that exercises the special case, for performance regression testing. Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge master - Generalize the fix to handle any input where AddIs are used multiple times by other AddIs, which could also lead to an exponential number of calls to ConvI2LNode::Ideal(). This is achieved by (1) reusing existing ConvI2Ls if possible rather than eagerly creating new ones and (2) postponing the optimization of newly created ConvI2Ls. Remove "hook" node solution introduced in JDK-8217359 since this is subsumed by (2). Test that ConvI2LNode::Ideal() is called within iterative GVN using phase->is_IterGVN() rather than can_reshape, for clarity. Merge all tests into a single class. Reimplement the microbenchmark as a test case that should time out in case of a combinatorial explosion. Add a second similar microbenchmark that demonstrates the need for this generalization. - Merge master - 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially In the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) within ConvI2LNode::Ideal(), handle the special case x = y by feeding both inputs of AddL from a single ConvI2L node rather than creating two semantically equivalent ConvI2L nodes. This avoids an exponential number of calls to ConvI2LNode::Ideal() when dealing with long chains of AddI nodes. Disable the optimization for the pattern ConvI2L(SubI(x, x)), which is simplified to zero during parsing anyway. Add a set of regression tests for the transformation that cover different shapes of AddI subgraphs. Also add a microbenchmark that exercises the special case, for performance regression testing. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/727/files - new: https://git.openjdk.java.net/jdk/pull/727/files/60eaec25..b5cf7aab Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=727&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=727&range=00-01 Stats: 37628 lines in 545 files changed: 25644 ins; 9509 del; 2475 mod Patch: https://git.openjdk.java.net/jdk/pull/727.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/727/head:pull/727 PR: https://git.openjdk.java.net/jdk/pull/727 From rcastanedalo at openjdk.java.net Wed Oct 21 10:32:12 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 21 Oct 2020 10:32:12 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v2] In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Mon, 19 Oct 2020 18:46:20 GMT, Roberto Casta?eda Lozano wrote: >> 2 concerns with the proposed fix on my side: >> >> - I?m not persuaded that covering ?x == y? is enough to completely eliminates the issue; >> >> - The issue demonstrates there?s still a chance to introduce very deep recursion involving `Compile::constrained_convI2L` and `PhaseIterGVN` which can cause a crash. >> >> IMO the root cause is an eager transformation happening top-down on `ConvI2L` nodes and it defeats memoization GVN naturally provides, so it causes a combinatorial explosion. If subsequent `Compile::constrained_convI2L()` calls could share the same `ConvI2L` node for the same input, it would be a more reliable fix for the problem. >> >> Otherwise, the transformation may be extracted from GVN and turned into a separate pass (take a look at `Compile::optimize_logic_cones` as an example). >> >> Some comments on the tests: (1) please, group the individual test cases into a single test class; and (2) I suggest to turn the benchmark into a test case which fails with timeout when fix is absent. > >> 2 concerns with the proposed fix on my side: >> >> * I?m not persuaded that covering ?x == y? is enough to completely eliminates the issue; >> >> * The issue demonstrates there?s still a chance to introduce very deep recursion involving `Compile::constrained_convI2L` and `PhaseIterGVN` which can cause a crash. >> >> >> IMO the root cause is an eager transformation happening top-down on `ConvI2L` nodes and it defeats memoization GVN naturally provides, so it causes a combinatorial explosion. If subsequent `Compile::constrained_convI2L()` calls could share the same `ConvI2L` node for the same input, it would be a more reliable fix for the problem. >> >> Otherwise, the transformation may be extracted from GVN and turned into a separate pass (take a look at `Compile::optimize_logic_cones` as an example). >> >> Some comments on the tests: (1) please, group the individual test cases into a single test class; and (2) I suggest to turn the benchmark into a test case which fails with timeout when fix is absent. > > Thanks Vladimir for the thorough review! I will explore your suggestions to generalize the fix and see what can be done. Prevent exponential number of calls to ConvI2LNode::Ideal() when AddIs are used multiple times by other AddIs in the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)). This is achieved by (1) reusing existing ConvI2Ls if possible rather than eagerly creating new ones and (2) postponing the optimization of newly created ConvI2Ls. Remove hook node solution introduced in 8217359, since this is subsumed by (2). Use phase->is_IterGVN() rather than can_reshape to check if ConvI2LNode::Ideal() is called within iterative GVN, for clarity. Add regression tests that cover different shapes and sizes of AddI subgraphs, implicitly checking (by not timing out) that there is no combinatorial explosion. ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From rcastanedalo at openjdk.java.net Wed Oct 21 10:35:18 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 21 Oct 2020 10:35:18 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v2] In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Wed, 21 Oct 2020 10:29:10 GMT, Roberto Casta?eda Lozano wrote: >>> 2 concerns with the proposed fix on my side: >>> >>> * I?m not persuaded that covering ?x == y? is enough to completely eliminates the issue; >>> >>> * The issue demonstrates there?s still a chance to introduce very deep recursion involving `Compile::constrained_convI2L` and `PhaseIterGVN` which can cause a crash. >>> >>> >>> IMO the root cause is an eager transformation happening top-down on `ConvI2L` nodes and it defeats memoization GVN naturally provides, so it causes a combinatorial explosion. If subsequent `Compile::constrained_convI2L()` calls could share the same `ConvI2L` node for the same input, it would be a more reliable fix for the problem. >>> >>> Otherwise, the transformation may be extracted from GVN and turned into a separate pass (take a look at `Compile::optimize_logic_cones` as an example). >>> >>> Some comments on the tests: (1) please, group the individual test cases into a single test class; and (2) I suggest to turn the benchmark into a test case which fails with timeout when fix is absent. >> >> Thanks Vladimir for the thorough review! I will explore your suggestions to generalize the fix and see what can be done. > > Prevent exponential number of calls to ConvI2LNode::Ideal() when AddIs are used > multiple times by other AddIs in the optimization ConvI2L(AddI(x, y)) -> > AddL(ConvI2L(x), ConvI2L(y)). This is achieved by (1) reusing existing ConvI2Ls > if possible rather than eagerly creating new ones and (2) postponing the > optimization of newly created ConvI2Ls. Remove hook node solution introduced in > 8217359, since this is subsumed by (2). Use phase->is_IterGVN() rather than > can_reshape to check if ConvI2LNode::Ideal() is called within iterative GVN, for > clarity. Add regression tests that cover different shapes and sizes of AddI > subgraphs, implicitly checking (by not timing out) that there is no > combinatorial explosion. Re-tested on tier1-3 and by running the new tests 10 times on all platforms (windows-x64, linux-x64, linux-aarch64, and macosx-x64) in both release and debug mode, to check that the timeout is sufficiently long. ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From martin.doerr at sap.com Wed Oct 21 11:10:45 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 21 Oct 2020 11:10:45 +0000 Subject: Vector API: stack overflow on Big Endian Message-ID: Hi, I noticed stack overflow issues because of endless recursion on Big Endian platforms (s390 and PPC64) in several tests: E.g. Int64VectorLoadStoreTests: at jdk.incubator.vector/jdk.incubator.vector.IntVector.maybeSwap(IntVector.java:3330) at jdk.incubator.vector/jdk.incubator.vector.IntVector.intoByteBuffer(IntVector.java:3151) at jdk.incubator.vector/jdk.incubator.vector.AbstractVector.defaultReinterpret(AbstractVector.java:505) at java.base/jdk.internal.vm.vector.VectorSupport.convert(VectorSupport.java:441) at jdk.incubator.vector/jdk.incubator.vector.AbstractVector.convert0(AbstractVector.java:686) at jdk.incubator.vector/jdk.incubator.vector.AbstractVector.asVectorRawTemplate(AbstractVector.java:173) at jdk.incubator.vector/jdk.incubator.vector.AbstractVector.asByteVectorRawTemplate(AbstractVector.java:179) at jdk.incubator.vector/jdk.incubator.vector.Int64Vector.asByteVectorRaw(Int64Vector.java:177) at jdk.incubator.vector/jdk.incubator.vector.Int64Vector.asByteVectorRaw(Int64Vector.java:41) at jdk.incubator.vector/jdk.incubator.vector.IntVector.reinterpretAsBytes(IntVector.java:3366) at jdk.incubator.vector/jdk.incubator.vector.IntVector.maybeSwap(IntVector.java:3330) Note that maybeSwap is endianness sensitive: IntVector maybeSwap(ByteOrder bo) { if (bo != NATIVE_ENDIAN) { return this.reinterpretAsBytes() .rearrange(swapBytesShuffle()) .reinterpretAsInts(); } return this; } How is this supposed to work? Is anything platform specific missing? Thanks and best regards, Martin From jbhateja at openjdk.java.net Wed Oct 21 11:55:26 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 21 Oct 2020 11:55:26 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v7] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 44 commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - 8253474: Javadoc clean up in HttpsExchange, HttpsParameters, and HttpsServer Reviewed-by: dfuchs, michaelm - 8255000: C2: Unify IGVN processing when loop opts are over Reviewed-by: neliasso, iveresov, kvn - 8255026: C2: Miscellaneous cleanups in Compile and PhaseIdealLoop code Reviewed-by: thartmann, neliasso, redestad - 8253964: [Graal] UnschedulableGraphTest#test01fails with expected:<4> but was:<3> Reviewed-by: kvn, dlong - 8255065: Zero: accessor_entry misses the IRIW case Reviewed-by: mdoerr - 8254785: compiler/graalunit/HotspotTest.java failed with "missing Graal intrinsics for: java/lang/StringLatin1.indexOfChar([BIII)I" Reviewed-by: psandoz, iignatyev, kvn - 8254976: Re-enable swing jtreg tests which were broken due to samevm mode Reviewed-by: serb - 8255043: Incorrectly styled copyright text Reviewed-by: dholmes, trebari, jdv - 8255074: sun.nio.fs.WindowsPath::getPathForWin32Calls synchronizes on String object Reviewed-by: bpb - ... and 34 more: https://git.openjdk.java.net/jdk/compare/da97ab5c...67b5b9e0 ------------- Changes: https://git.openjdk.java.net/jdk/pull/302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=06 Stats: 454 lines in 23 files changed: 430 ins; 0 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Wed Oct 21 12:01:28 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 21 Oct 2020 12:01:28 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v8] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8252848' of http://github.com/jatin-bhateja/jdk into JDK-8252848 - Merge remote-tracking branch 'upstream' into JDK-8252848 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/302/files - new: https://git.openjdk.java.net/jdk/pull/302/files/67b5b9e0..08724c33 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=06-07 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Wed Oct 21 12:13:27 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 21 Oct 2020 12:13:27 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v9] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/302/files - new: https://git.openjdk.java.net/jdk/pull/302/files/08724c33..12a7820e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=07-08 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From mdoerr at openjdk.java.net Wed Oct 21 13:35:20 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 21 Oct 2020 13:35:20 GMT Subject: RFR: 8255129: [PPC64, s390] Check vector_size_supported and add VectorReinterpret node Message-ID: match_rule_supported_vector on PPC64 and s390 need to check vector_size_supported. In addition, an implementation for VectorReinterpret is needed. It can get implemented empty when using src = dst register. Note that these 2 platforms support only one vector size at a time, so there's no need for move between different sizes. I'd like to clean up match_rule_supported, too. Cases for which we return true don't need to get checked explicitely because true is default. And we don't need to check SpecialString... flags because they are handled by "disabled_by_jvm_flags". So they can still get disabled (e.g. by jdk/bin/java -XX:-TieredCompilation -XX:-SpecialStringIndexOf -XX:+PrintCompilation -XX:+PrintInlining TestString|grep StringLatin1::indexOf). ------------- Commit messages: - 8255129: [PPC64, s390] Check vector_size_supported and add VectorReinterpret node Changes: https://git.openjdk.java.net/jdk/pull/783/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=783&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255129 Stats: 158 lines in 2 files changed: 29 ins; 56 del; 73 mod Patch: https://git.openjdk.java.net/jdk/pull/783.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/783/head:pull/783 PR: https://git.openjdk.java.net/jdk/pull/783 From sandhya.viswanathan at intel.com Wed Oct 21 16:08:58 2020 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 21 Oct 2020 16:08:58 +0000 Subject: RFR: 8254790: javax/xml/crypto/dsig/GenerationTests.java failed with SIGSEGV in C2 frame In-Reply-To: References: Message-ID: Thanks a lot Vladimir! -----Original Message----- From: hotspot-compiler-dev On Behalf Of Vladimir Kozlov Sent: Tuesday, October 20, 2020 8:41 PM To: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR: 8254790: javax/xml/crypto/dsig/GenerationTests.java failed with SIGSEGV in C2 frame On Tue, 20 Oct 2020 21:52:30 GMT, Vladimir Kozlov wrote: >> The problem was due to 32 bit arithmetic instruction used for 64-bit >> address in string_indexof_char and stringL_indexof_char: "addl(result, ch)". >> >> This patch replaces the addl instruction with addptr and also enables the stringL_indexof_char intrinsic. > > Looks good. Let me finish testing before integration. tier1-3 testing passed on x64 (all OSs). ------------- PR: https://git.openjdk.java.net/jdk/pull/772 From sviswanathan at openjdk.java.net Wed Oct 21 16:23:15 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 21 Oct 2020 16:23:15 GMT Subject: RFR: 8254790: SIGSEGV in string_indexof_char and stringL_indexof_char intrinsics In-Reply-To: References: Message-ID: On Wed, 21 Oct 2020 07:18:53 GMT, Nils Eliasson wrote: >> Looks good to me. > > Please change the bug title to something that describes the problem. Thanks a lot @TobiHartmann for the review. @neliasso I have changed the bug title to "SIGSEGV in string_indexof_char and stringL_indexof_char intrinsics". ------------- PR: https://git.openjdk.java.net/jdk/pull/772 From sviswanathan at openjdk.java.net Wed Oct 21 16:30:23 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 21 Oct 2020 16:30:23 GMT Subject: Integrated: 8254790: SIGSEGV in string_indexof_char and stringL_indexof_char intrinsics In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 20:23:23 GMT, Sandhya Viswanathan wrote: > The problem was due to 32 bit arithmetic instruction used for 64-bit address in > string_indexof_char and stringL_indexof_char: "addl(result, ch)". > > This patch replaces the addl instruction with addptr and also enables the stringL_indexof_char intrinsic. This pull request has now been integrated. Changeset: 365f19c8 Author: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/365f19c8 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod 8254790: SIGSEGV in string_indexof_char and stringL_indexof_char intrinsics Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/772 From kvn at openjdk.java.net Wed Oct 21 18:01:24 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 21 Oct 2020 18:01:24 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v2] In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Wed, 21 Oct 2020 10:04:14 GMT, Roberto Casta?eda Lozano wrote: >> Prevent exponential number of calls to `ConvI2LNode::Ideal()` when AddIs are used multiple times by other AddIs in the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)). This is achieved by (1) reusing existing ConvI2Ls if possible rather than eagerly creating new ones and (2) postponing the optimization of newly created ConvI2Ls. Remove hook node solution introduced in [8217359](https://github.com/openjdk/jdk/commit/cf554816d1952f722143e9d03ec669e80f955adf), since this is subsumed by (2). Use `phase->is_IterGVN()` rather than `can_reshape` to check if `ConvI2LNode::Ideal()` is called within iterative GVN, for clarity. Add regression tests that cover different shapes and sizes of AddI subgraphs, implicitly checking (by not timing out) that there is no combinatorial explosion. > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge master > - Generalize the fix to handle any input where AddIs are used multiple times by > other AddIs, which could also lead to an exponential number of calls to > ConvI2LNode::Ideal(). This is achieved by (1) reusing existing ConvI2Ls if > possible rather than eagerly creating new ones and (2) postponing the > optimization of newly created ConvI2Ls. Remove "hook" node solution introduced > in JDK-8217359 since this is subsumed by (2). Test that ConvI2LNode::Ideal() is > called within iterative GVN using phase->is_IterGVN() rather than can_reshape, > for clarity. > > Merge all tests into a single class. Reimplement the microbenchmark as a test > case that should time out in case of a combinatorial explosion. Add a second > similar microbenchmark that demonstrates the need for this generalization. > - Merge master > - 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially > > In the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) within > ConvI2LNode::Ideal(), handle the special case x = y by feeding both inputs of > AddL from a single ConvI2L node rather than creating two semantically equivalent > ConvI2L nodes. This avoids an exponential number of calls to > ConvI2LNode::Ideal() when dealing with long chains of AddI nodes. Disable the > optimization for the pattern ConvI2L(SubI(x, x)), which is simplified to zero > during parsing anyway. Add a set of regression tests for the transformation that > cover different shapes of AddI subgraphs. Also add a microbenchmark that > exercises the special case, for performance regression testing. Changes requested by kvn (Reviewer). test/hotspot/jtreg/compiler/conversions/TestMoveConvI2LThroughAddIs.java line 41: > 39: * -XX:CompileOnly=::testChain,::testTree,::testDAG > 40: * compiler.conversions.TestMoveConvI2LThroughAddIs basic > 41: * @run main/othervm/timeout=1 -Xcomp -XX:-TieredCompilation -XX:-Inline By default timeout is 120 sec. Why you set it to 1 sec? The same for second @run. test/hotspot/jtreg/compiler/conversions/TestMoveConvI2LThroughAddIs.java line 38: > 36: * the short specified timeout. > 37: * @library /test/lib / > 38: * @run main/othervm -Xcomp -XX:-TieredCompilation -XX:-Inline For future tests writing. In general we should avoid using -Xcomp because it does not provide profiling information to C2 and it may generate unexpected code because branches frequencies are not known. Preferable way is warm up test method by calling it in a loop enough times to make sure it is compiled and then verify results outside test method. Then you don't need -XX:CompileOnly command. ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From never at openjdk.java.net Wed Oct 21 18:27:29 2020 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 21 Oct 2020 18:27:29 GMT Subject: RFR: 8255068: [JVMCI] errors during compiler creation can be hidden [v2] In-Reply-To: References: Message-ID: > This improves the handling of errors during compiler creation. @dougxc @vnkozlov Tom Rodriguez has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8255068: [JVMCI] errors during compiler creation can be hidden ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/768/files - new: https://git.openjdk.java.net/jdk/pull/768/files/a5a6278e..2ab7aa9a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=768&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=768&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/768.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/768/head:pull/768 PR: https://git.openjdk.java.net/jdk/pull/768 From kvn at openjdk.java.net Wed Oct 21 19:24:13 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 21 Oct 2020 19:24:13 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v10] In-Reply-To: References: Message-ID: <06kvIM_3abB-35pPdgFfbvwCND6oe9QCBqXBQ8iIrZ4=.64ae7da4-be02-46cc-afde-ffeb9ec9d703@github.com> On Wed, 21 Oct 2020 09:19:57 GMT, Fei Yang wrote: > > Someone in Oracle have to run tier1-tier3 testing with these changes to make sure nothing is broken. I don't want to repeat 8254790. > > That's appreciated. > On my side, I run tier1-tier3 both on aarch64 linux and x86_64 linux. > The test result on these two platforms looks good for the latest changes. I started testing of 09: version. >> src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java line 604: >> >>> 602: add(ignore, "sun/security/provider/SHA5." + shaCompressName + "([BI)V"); >>> 603: } >>> 604: add(toBeInvestigated, "sun/security/provider/SHA3." + shaCompressName + "([BI)V"); >> >> This should be under `if (isJDK16OrHigher())` check. Something like this: >> https://github.com/openjdk/jdk/pull/650/files#diff-d1f378fc1b7fe041309e854d40b3a95a91e63fdecf0ecd9826b7c95eaeba314eR527 >> You can wait when Aleksey push it and update your changes > > OK. Will update with the following change after Aleksey's PR is integrated: > > --- a/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java > +++ b/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java > @@ -608,6 +608,10 @@ public class CheckGraalIntrinsics extends GraalTest { > if (!config.useSHA512Intrinsics()) { > add(ignore, "sun/security/provider/SHA5." + shaCompressName + "([BI)V"); > } > + > + if (isJDK16OrHigher()) { > + add(toBeInvestigated, "sun/security/provider/SHA3." + shaCompressName + "([BI)V"); > + } > } Yes, please, do that. ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From ecaspole at openjdk.java.net Wed Oct 21 19:34:10 2020 From: ecaspole at openjdk.java.net (Eric Caspole) Date: Wed, 21 Oct 2020 19:34:10 GMT Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: On Tue, 20 Oct 2020 06:56:44 GMT, Aleksey Shipilev wrote: >> We have seen some specific benefits to increasing InlineSmallCode to 2500 from 2000, and across the whole promo build perf test collection the change is neutral to slightly positive, where the tests are run on modern OCI systems. >> >> Passed tier1 testing, some ad-hoc perf testing and more compiler related parts of the weekly promo performance set. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8254913 >> Thanks, >> Eric > > That is OK, looks good then. The motivation for this change is, among other things, that we discovered a big regression in this micro in 11 vs 8: https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/lang/NewInstance.java jdk1.8.0_271: Benchmark Mode Cnt Score Error Units NewInstance.threeDifferentProtected avgt 5 61.413 ? 0.078 ns/op NewInstance.threeDifferentPublic avgt 5 56.705 ? 0.096 ns/op NewInstance.threeDifferentPublicConstant avgt 5 43.316 ? 0.094 ns/op NewInstance.threeDifferentPublicFinal avgt 5 56.586 ? 0.140 ns/op NewInstance.threeSameProtected avgt 5 46.714 ? 0.045 ns/op NewInstance.threeSamePublic avgt 5 44.565 ? 0.058 ns/op jdk-11.0.9: Benchmark Mode Cnt Score Error Units NewInstance.threeDifferentProtected avgt 5 851.560 ? 44.299 ns/op NewInstance.threeDifferentPublic avgt 5 854.080 ? 7.619 ns/op NewInstance.threeDifferentPublicConstant avgt 5 911.749 ? 33.695 ns/op NewInstance.threeDifferentPublicFinal avgt 5 830.804 ? 21.219 ns/op NewInstance.threeSameProtected avgt 5 785.063 ? 2.580 ns/op NewInstance.threeSamePublic avgt 5 792.167 ? 0.538 ns/op jdk-11.0.9 w/ -XX:InlineSmallCode=2500: Benchmark Mode Cnt Score Error Units NewInstance.threeDifferentProtected avgt 5 58.091 ? 0.012 ns/op NewInstance.threeDifferentPublic avgt 5 55.514 ? 0.062 ns/op NewInstance.threeDifferentPublicConstant avgt 5 43.233 ? 0.079 ns/op NewInstance.threeDifferentPublicFinal avgt 5 54.955 ? 0.103 ns/op NewInstance.threeSameProtected avgt 5 44.216 ? 0.013 ns/op NewInstance.threeSamePublic avgt 5 44.214 ? 0.009 ns/op This carries on into 14, then other fixes in 15 make it go bi-modal where it may or may not get inlined run to run. Also, increasing InlineSmallCode for x64 to 2500 makes it now equal to ARM64, which we think is sensible. Here are some results on a recent internal testing build of jdk-16 default vs -XX:InlineSmallCode=2500. All these results were statistically insignificant. Name Pct-Diff ============== "SPECjvm2008-Compress-G1", 0.889, "SPECjvm2008-Crypto.aes-G1", 2.884, "SPECjvm2008-Crypto.rsa-G1", 0.230, "SPECjvm2008-Crypto.signverify-G1", 0.824, "SPECjvm2008-Derby-ParGC", 0.878, "SPECjvm2008-FFT.large-G1", 1.626, "SPECjvm2008-FFT.small-G1", 2.023, "SPECjvm2008-LU.large-ParGC", -14.624, "SPECjvm2008-LU.small-ParGC", -0.633, "SPECjvm2008-MPEG-ParGC", 0.326, "SPECjvm2008-MonteCarlo-G1", 3.054, "SPECjvm2008-MonteCarlo-ZGC", -2.247, "SPECjvm2008-SOR.large-ParGC", 10.729, "SPECjvm2008-SOR.small-ParGC", 0.127, "SPECjvm2008-Serial-ParGC", 0.909, "SPECjvm2008-Sparse.large-G1", 0.204, "SPECjvm2008-Sparse.small-G1", 0.594, "SPECjvm2008-XML.transform-G1", 0.349, "SPECjvm2008-XML.validation-G1", 0.378, These tests are run on the generally available OCI BM2.52 platform. See https://docs.cloud.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm ------------- PR: https://git.openjdk.java.net/jdk/pull/705 From never at openjdk.java.net Wed Oct 21 19:37:13 2020 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 21 Oct 2020 19:37:13 GMT Subject: RFR: 8255068: [JVMCI] errors during compiler creation can be hidden [v2] In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 20:22:55 GMT, Doug Simon wrote: >> Good. > > Good. There was an extra blank line in the formatting which I just fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/768 From never at openjdk.java.net Wed Oct 21 19:43:11 2020 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 21 Oct 2020 19:43:11 GMT Subject: Integrated: 8255068: [JVMCI] errors during compiler creation can be hidden In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 18:18:14 GMT, Tom Rodriguez wrote: > This improves the handling of errors during compiler creation. @dougxc @vnkozlov This pull request has now been integrated. Changeset: 60209915 Author: Tom Rodriguez URL: https://git.openjdk.java.net/jdk/commit/60209915 Stats: 24 lines in 1 file changed: 22 ins; 0 del; 2 mod 8255068: [JVMCI] errors during compiler creation can be hidden Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/768 From azeemj at openjdk.java.net Wed Oct 21 19:45:17 2020 From: azeemj at openjdk.java.net (Azeem Jiva) Date: Wed, 21 Oct 2020 19:45:17 GMT Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: <6T57pMA-1ew8Yo7YpVFzSRXMpOPFPCNa529l0-ip8xs=.e15a392e-c9ee-4132-8a99-a6bb58ad1b24@github.com> On Fri, 16 Oct 2020 17:31:58 GMT, Eric Caspole wrote: > We have seen some specific benefits to increasing InlineSmallCode to 2500 from 2000, and across the whole promo build perf test collection the change is neutral to slightly positive, where the tests are run on modern OCI systems. > > Passed tier1 testing, some ad-hoc perf testing and more compiler related parts of the weekly promo performance set. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8254913 > Thanks, > Eric Marked as reviewed by azeemj (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/705 From azeemj at openjdk.java.net Wed Oct 21 19:45:17 2020 From: azeemj at openjdk.java.net (Azeem Jiva) Date: Wed, 21 Oct 2020 19:45:17 GMT Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: On Wed, 21 Oct 2020 19:31:47 GMT, Eric Caspole wrote: >> That is OK, looks good then. > > The motivation for this change is, among other things, that we discovered a big regression in this micro in 11 vs 8: > > https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/lang/NewInstance.java > > jdk1.8.0_271: > > Benchmark Mode Cnt Score Error Units > NewInstance.threeDifferentProtected avgt 5 61.413 ? 0.078 ns/op > NewInstance.threeDifferentPublic avgt 5 56.705 ? 0.096 ns/op > NewInstance.threeDifferentPublicConstant avgt 5 43.316 ? 0.094 ns/op > NewInstance.threeDifferentPublicFinal avgt 5 56.586 ? 0.140 ns/op > NewInstance.threeSameProtected avgt 5 46.714 ? 0.045 ns/op > NewInstance.threeSamePublic avgt 5 44.565 ? 0.058 ns/op > > jdk-11.0.9: > > Benchmark Mode Cnt Score Error Units > NewInstance.threeDifferentProtected avgt 5 851.560 ? 44.299 ns/op > NewInstance.threeDifferentPublic avgt 5 854.080 ? 7.619 ns/op > NewInstance.threeDifferentPublicConstant avgt 5 911.749 ? 33.695 ns/op > NewInstance.threeDifferentPublicFinal avgt 5 830.804 ? 21.219 ns/op > NewInstance.threeSameProtected avgt 5 785.063 ? 2.580 ns/op > NewInstance.threeSamePublic avgt 5 792.167 ? 0.538 ns/op > > jdk-11.0.9 w/ -XX:InlineSmallCode=2500: > > Benchmark Mode Cnt Score Error Units > NewInstance.threeDifferentProtected avgt 5 58.091 ? 0.012 ns/op > NewInstance.threeDifferentPublic avgt 5 55.514 ? 0.062 ns/op > NewInstance.threeDifferentPublicConstant avgt 5 43.233 ? 0.079 ns/op > NewInstance.threeDifferentPublicFinal avgt 5 54.955 ? 0.103 ns/op > NewInstance.threeSameProtected avgt 5 44.216 ? 0.013 ns/op > NewInstance.threeSamePublic avgt 5 44.214 ? 0.009 ns/op > > This carries on into 14, then other fixes in 15 make it go bi-modal where it may or may not get inlined run to run. > > Also, increasing InlineSmallCode for x64 to 2500 makes it now equal to ARM64, which we think is sensible. > > Here are some results on a recent internal testing build of jdk-16 default vs -XX:InlineSmallCode=2500. > All these results were statistically insignificant. > > Name Pct-Diff > ============== > "SPECjvm2008-Compress-G1", 0.889, > "SPECjvm2008-Crypto.aes-G1", 2.884, > "SPECjvm2008-Crypto.rsa-G1", 0.230, > "SPECjvm2008-Crypto.signverify-G1", 0.824, > "SPECjvm2008-Derby-ParGC", 0.878, > "SPECjvm2008-FFT.large-G1", 1.626, > "SPECjvm2008-FFT.small-G1", 2.023, > "SPECjvm2008-LU.large-ParGC", -14.624, > "SPECjvm2008-LU.small-ParGC", -0.633, > "SPECjvm2008-MPEG-ParGC", 0.326, > "SPECjvm2008-MonteCarlo-G1", 3.054, > "SPECjvm2008-MonteCarlo-ZGC", -2.247, > "SPECjvm2008-SOR.large-ParGC", 10.729, > "SPECjvm2008-SOR.small-ParGC", 0.127, > "SPECjvm2008-Serial-ParGC", 0.909, > "SPECjvm2008-Sparse.large-G1", 0.204, > "SPECjvm2008-Sparse.small-G1", 0.594, > "SPECjvm2008-XML.transform-G1", 0.349, > "SPECjvm2008-XML.validation-G1", 0.378, > > These tests are run on the generally available OCI BM2.52 platform. See https://docs.cloud.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm Should this be backported to 11 then? ------------- PR: https://git.openjdk.java.net/jdk/pull/705 From ecaspole at openjdk.java.net Wed Oct 21 20:41:12 2020 From: ecaspole at openjdk.java.net (Eric Caspole) Date: Wed, 21 Oct 2020 20:41:12 GMT Subject: Integrated: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: On Fri, 16 Oct 2020 17:31:58 GMT, Eric Caspole wrote: > We have seen some specific benefits to increasing InlineSmallCode to 2500 from 2000, and across the whole promo build perf test collection the change is neutral to slightly positive, where the tests are run on modern OCI systems. > > Passed tier1 testing, some ad-hoc perf testing and more compiler related parts of the weekly promo performance set. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8254913 > Thanks, > Eric This pull request has now been integrated. Changeset: 85a8949c Author: Eric Caspole URL: https://git.openjdk.java.net/jdk/commit/85a8949c Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 Reviewed-by: redestad, shade, azeemj ------------- PR: https://git.openjdk.java.net/jdk/pull/705 From kvn at openjdk.java.net Wed Oct 21 20:51:11 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 21 Oct 2020 20:51:11 GMT Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: On Fri, 16 Oct 2020 17:31:58 GMT, Eric Caspole wrote: > We have seen some specific benefits to increasing InlineSmallCode to 2500 from 2000, and across the whole promo build perf test collection the change is neutral to slightly positive, where the tests are run on modern OCI systems. > > Passed tier1 testing, some ad-hoc perf testing and more compiler related parts of the weekly promo performance set. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8254913 > Thanks, > Eric Good. ------------- PR: https://git.openjdk.java.net/jdk/pull/705 From sandhya.viswanathan at intel.com Wed Oct 21 22:39:45 2020 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 21 Oct 2020 22:39:45 +0000 Subject: [11u] RFR 8254790: SIGSEGV in string_indexof_char and stringL_indexof_char intrinsics Message-ID: The fix for 8254790 needs to be backported to JDK11u. The fix is one-line change in string_indexof_char intrinsic. The intrinsic is moved to c2_MacroAssembler_x86.cpp since jdk15 and was in macroAssembler_x86.cpp in JDK 11u so I am sending a separate webrev review request. The other stringL_indexof_char intrinsic didn't exist in JDK11u so that fix is omitted in this webrev. JBS: https://bugs.openjdk.java.net/browse/JDK-8254790 Webrev: http://cr.openjdk.java.net/~sviswanathan/8254790/webrev.00/ Best Regards, Sandhya From vladimir.kozlov at oracle.com Wed Oct 21 22:42:02 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 21 Oct 2020 15:42:02 -0700 Subject: [11u] RFR 8254790: SIGSEGV in string_indexof_char and stringL_indexof_char intrinsics In-Reply-To: References: Message-ID: Looks good. On 10/21/20 3:39 PM, Viswanathan, Sandhya wrote: > The fix for 8254790 needs to be backported to JDK11u. > The fix is one-line change in string_indexof_char intrinsic. > The intrinsic is moved to c2_MacroAssembler_x86.cpp since jdk15 and was in macroAssembler_x86.cpp in JDK 11u so I am sending a separate webrev review request. > The other stringL_indexof_char intrinsic didn't exist in JDK11u so that fix is omitted in this webrev. Intrinsic stringL_indexof_char was added in JDK 16. So JDK 15u backport will be similar to 11u. Thanks, Vladimir K > > JBS: https://bugs.openjdk.java.net/browse/JDK-8254790 > Webrev: http://cr.openjdk.java.net/~sviswanathan/8254790/webrev.00/ > > Best Regards, > Sandhya > > From fyang at openjdk.java.net Wed Oct 21 23:42:33 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Wed, 21 Oct 2020 23:42:33 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v11] In-Reply-To: References: Message-ID: > Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com > > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. > Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 > > Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. > For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. > "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. > > We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. > Patch passed jtreg tier1-3 tests with QEMU system emulator. > Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make sure that there's no regression. > > We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java > We measured the performance benefit with an aarch64 cycle-accurate simulator. > Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. > > For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is detected. Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Add if (isJDK16OrHigher()) check for SHA3 in CheckGraalIntrinsics.java ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/207/files - new: https://git.openjdk.java.net/jdk/pull/207/files/d32c8ad7..b43f9197 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=207&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=207&range=09-10 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/207.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/207/head:pull/207 PR: https://git.openjdk.java.net/jdk/pull/207 From paul.sandoz at oracle.com Thu Oct 22 00:26:51 2020 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 21 Oct 2020 17:26:51 -0700 Subject: Vector API: stack overflow on Big Endian In-Reply-To: References: Message-ID: <08399669-DCB9-4CB4-A23F-A6B64439D1DA@oracle.com> Hi Martin, We definitely have far less exposure by default to BE platforms now that SPARC is not a thing, it's easy to unintentionally hardcode a bias to LE. Would it be possible to try running the tests on your BE platforms with the following modification? --- a/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java +++ b/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java @@ -500,7 +500,7 @@ abstract class AbstractVector extends Vector { final AbstractVector defaultReinterpret(AbstractSpecies rsp) { int blen = Math.max(this.bitSize(), rsp.vectorBitSize()) / Byte.SIZE; - ByteOrder bo = ByteOrder.LITTLE_ENDIAN; + ByteOrder bo = ByteOrder.nativeOrder();//LITTLE_ENDIAN; ByteBuffer bb = ByteBuffer.allocate(blen); this.intoByteBuffer(bb, 0, bo); VectorMask m = rsp.maskAll(true); Paul. > On Oct 21, 2020, at 4:10 AM, Doerr, Martin wrote: > > Hi, > > I noticed stack overflow issues because of endless recursion on Big Endian platforms (s390 and PPC64) in several tests: > > E.g. Int64VectorLoadStoreTests: > > at jdk.incubator.vector/jdk.incubator.vector.IntVector.maybeSwap(IntVector.java:3330) > at jdk.incubator.vector/jdk.incubator.vector.IntVector.intoByteBuffer(IntVector.java:3151) > at jdk.incubator.vector/jdk.incubator.vector.AbstractVector.defaultReinterpret(AbstractVector.java:505) > at java.base/jdk.internal.vm.vector.VectorSupport.convert(VectorSupport.java:441) > at jdk.incubator.vector/jdk.incubator.vector.AbstractVector.convert0(AbstractVector.java:686) > at jdk.incubator.vector/jdk.incubator.vector.AbstractVector.asVectorRawTemplate(AbstractVector.java:173) > at jdk.incubator.vector/jdk.incubator.vector.AbstractVector.asByteVectorRawTemplate(AbstractVector.java:179) > at jdk.incubator.vector/jdk.incubator.vector.Int64Vector.asByteVectorRaw(Int64Vector.java:177) > at jdk.incubator.vector/jdk.incubator.vector.Int64Vector.asByteVectorRaw(Int64Vector.java:41) > at jdk.incubator.vector/jdk.incubator.vector.IntVector.reinterpretAsBytes(IntVector.java:3366) > at jdk.incubator.vector/jdk.incubator.vector.IntVector.maybeSwap(IntVector.java:3330) > > Note that maybeSwap is endianness sensitive: > IntVector maybeSwap(ByteOrder bo) { > if (bo != NATIVE_ENDIAN) { > return this.reinterpretAsBytes() > .rearrange(swapBytesShuffle()) > .reinterpretAsInts(); > } > return this; > } > > How is this supposed to work? > Is anything platform specific missing? > > Thanks and best regards, > Martin > From fyang at openjdk.java.net Thu Oct 22 00:49:12 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Thu, 22 Oct 2020 00:49:12 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v10] In-Reply-To: <06kvIM_3abB-35pPdgFfbvwCND6oe9QCBqXBQ8iIrZ4=.64ae7da4-be02-46cc-afde-ffeb9ec9d703@github.com> References: <06kvIM_3abB-35pPdgFfbvwCND6oe9QCBqXBQ8iIrZ4=.64ae7da4-be02-46cc-afde-ffeb9ec9d703@github.com> Message-ID: On Wed, 21 Oct 2020 19:20:28 GMT, Vladimir Kozlov wrote: >> OK. Will update with the following change after Aleksey's PR is integrated: >> >> --- a/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java >> +++ b/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java >> @@ -608,6 +608,10 @@ public class CheckGraalIntrinsics extends GraalTest { >> if (!config.useSHA512Intrinsics()) { >> add(ignore, "sun/security/provider/SHA5." + shaCompressName + "([BI)V"); >> } >> + >> + if (isJDK16OrHigher()) { >> + add(toBeInvestigated, "sun/security/provider/SHA3." + shaCompressName + "([BI)V"); >> + } >> } > > Yes, please, do that. Done. Commit: https://github.com/openjdk/jdk/pull/207/commits/b43f91970d44e6e0c1b3b4ef452ec388ecbecb83 I think this will not conflict with Aleksey's PR as we modify in different places of CheckGraalIntrinsics.java ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From jiefu at openjdk.java.net Thu Oct 22 03:19:16 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 22 Oct 2020 03:19:16 GMT Subject: RFR: 8255210: [Vector API] jdk/incubator/vector/Int256VectorTests.java crashes on AVX512 machines Message-ID: Hi all, Please review the fix of an AVX512 crash for Vector API. The reason is that reductionI in x86.ad didn't use legVec for code generation, which is required by Assembler::vphaddd. Testing: - test/jdk/jdk/incubator/vector all passed on both AVX256 and AVX512 machines Thanks. Best regards, Jie ------------- Commit messages: - 8255210: [Vector API] jdk/incubator/vector/Int256VectorTests.java crashes on AVX512 machines Changes: https://git.openjdk.java.net/jdk/pull/791/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=791&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255210 Stats: 22 lines in 1 file changed: 0 ins; 20 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/791.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/791/head:pull/791 PR: https://git.openjdk.java.net/jdk/pull/791 From github.com+10482586+erik1iu at openjdk.java.net Thu Oct 22 03:35:12 2020 From: github.com+10482586+erik1iu at openjdk.java.net (eric.1iu) Date: Thu, 22 Oct 2020 03:35:12 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v2] In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: <-j2ap4vnF04oKqLtf5JqGeQJ80PiuwOcbt6XOKGzg7k=.2b195e0e-21f2-425d-a0fe-8d92cd5e44d4@github.com> On Wed, 21 Oct 2020 17:56:04 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge master >> - Generalize the fix to handle any input where AddIs are used multiple times by >> other AddIs, which could also lead to an exponential number of calls to >> ConvI2LNode::Ideal(). This is achieved by (1) reusing existing ConvI2Ls if >> possible rather than eagerly creating new ones and (2) postponing the >> optimization of newly created ConvI2Ls. Remove "hook" node solution introduced >> in JDK-8217359 since this is subsumed by (2). Test that ConvI2LNode::Ideal() is >> called within iterative GVN using phase->is_IterGVN() rather than can_reshape, >> for clarity. >> >> Merge all tests into a single class. Reimplement the microbenchmark as a test >> case that should time out in case of a combinatorial explosion. Add a second >> similar microbenchmark that demonstrates the need for this generalization. >> - Merge master >> - 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially >> >> In the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) within >> ConvI2LNode::Ideal(), handle the special case x = y by feeding both inputs of >> AddL from a single ConvI2L node rather than creating two semantically equivalent >> ConvI2L nodes. This avoids an exponential number of calls to >> ConvI2LNode::Ideal() when dealing with long chains of AddI nodes. Disable the >> optimization for the pattern ConvI2L(SubI(x, x)), which is simplified to zero >> during parsing anyway. Add a set of regression tests for the transformation that >> cover different shapes of AddI subgraphs. Also add a microbenchmark that >> exercises the special case, for performance regression testing. > > test/hotspot/jtreg/compiler/conversions/TestMoveConvI2LThroughAddIs.java line 38: > >> 36: * the short specified timeout. >> 37: * @library /test/lib / >> 38: * @run main/othervm -Xcomp -XX:-TieredCompilation -XX:-Inline > > For future tests writing. > In general we should avoid using -Xcomp because it does not provide profiling information to C2 and it may generate unexpected code because branches frequencies are not known. > Preferable way is warm up test method by calling it in a loop enough times to make sure it is compiled and then verify results outside test method. > Then you don't need -XX:CompileOnly command. Without -Xcomp, the final code would not the same as expected, lots of branches have been optimized with profiling data which we **do** normally used. I wonder whether it's feasible to create such a test case that grows exponentially with profiling data. I think that may decriable the issue more realistic. ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From kvn at openjdk.java.net Thu Oct 22 04:02:15 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 22 Oct 2020 04:02:15 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v11] In-Reply-To: References: Message-ID: On Wed, 21 Oct 2020 23:42:33 GMT, Fei Yang wrote: >> Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com >> >> This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. >> Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 >> >> Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. >> For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. >> "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. >> >> We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. >> Patch passed jtreg tier1-3 tests with QEMU system emulator. >> Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make sure that there's no regression. >> >> We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java >> We measured the performance benefit with an aarch64 cycle-accurate simulator. >> Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. >> >> For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is detected. > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Add if (isJDK16OrHigher()) check for SHA3 in CheckGraalIntrinsics.java tier1,2,3 passed. I verified that new SHA3 tests were run and passed. But because SHA3 is not enabled for now (even on aarch64), it does not test asm code. At least testing verified that changes in shared code does not cause any issues. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/207 From fyang at openjdk.java.net Thu Oct 22 04:23:11 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Thu, 22 Oct 2020 04:23:11 GMT Subject: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v11] In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 03:59:45 GMT, Vladimir Kozlov wrote: > tier1,2,3 passed. I verified that new SHA3 tests were run and passed. > But because SHA3 is not enabled for now (even on aarch64), it does not test asm code. > At least testing verified that changes in shared code does not cause any issues. Great to hear that :-) Thanks for the effect. With that testing result and reviewing from three reviewers, I think it's safe to integrate. ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From fyang at openjdk.java.net Thu Oct 22 04:44:21 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Thu, 22 Oct 2020 04:44:21 GMT Subject: Integrated: 8252204: AArch64: Implement SHA3 accelerator/intrinsic In-Reply-To: References: Message-ID: On Wed, 16 Sep 2020 16:36:54 GMT, Fei Yang wrote: > Contributed-by: ard.biesheuvel at linaro.org, dongbo4 at huawei.com > > This added an intrinsic for SHA3 using aarch64 v8.2 SHA3 Crypto Extensions. > Reference implementation for core SHA-3 transform using ARMv8.2 Crypto Extensions: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/crypto/sha3-ce-core.S?h=v5.4.52 > > Trivial adaptation in SHA3. implCompress is needed for the purpose of adding the intrinsic. > For SHA3, we need to pass one extra parameter "digestLength" to the stub for the calculation of block size. > "digestLength" is also used in for the EOR loop before keccak to differentiate different SHA3 variants. > > We added jtreg tests for SHA3 and used QEMU system emulator which supports SHA3 instructions to test the functionality. > Patch passed jtreg tier1-3 tests with QEMU system emulator. > Also verified with jtreg tier1-3 tests without SHA3 instructions on aarch64-linux-gnu and x86_64-linux-gnu, to make sure that there's no regression. > > We used one existing JMH test for performance test: test/micro/org/openjdk/bench/java/security/MessageDigests.java > We measured the performance benefit with an aarch64 cycle-accurate simulator. > Patch delivers 20% - 40% performance improvement depending on specific SHA3 digest length and size of the message. > > For now, this feature will not be enabled automatically for aarch64. We can auto-enable this when it is fully tested on real hardware. But for the above testing purposes, this is auto-enabled when the corresponding hardware feature is detected. This pull request has now been integrated. Changeset: b25d8940 Author: Fei Yang URL: https://git.openjdk.java.net/jdk/commit/b25d8940 Stats: 1265 lines in 36 files changed: 1010 ins; 22 del; 233 mod 8252204: AArch64: Implement SHA3 accelerator/intrinsic Co-authored-by: Ard Biesheuvel Co-authored-by: Dong Bo Reviewed-by: aph, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/207 From ngasson at openjdk.java.net Thu Oct 22 07:09:12 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 22 Oct 2020 07:09:12 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v2] In-Reply-To: References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: On Thu, 22 Oct 2020 04:57:18 GMT, Thomas Stuefe wrote: >> Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Update for review comments > > src/hotspot/share/code/codeCache.hpp line 194: > >> 192: static void print_summary(outputStream* st, bool detailed = true); // Prints a summary of the code cache usage >> 193: static void log_state(outputStream* st); >> 194: static void write_perf_map(outputStream* st); > > Seems weird for this function to have an outputStream parameter only to write the dump to an unrelated file and ignore the stream for everything but the final message. > > I would either pass in the file name as an option - preferably configurable - and write the last message out here; or just write the whole perf dump to the outputstream itself, piping it to jcmd and let the caller do what he wants with it (e.g. just redirecting). The latter is what most commands do. Not sure how large the perf dump gets though, may be impractical. OK. I think I'll change it so `write_perf_map()` writes to the `outputStream` and then `PerfMapDCmd::execute()` handles redirecting it to a file. I don't think it makes sense to write it directly to the jcmd output though. > src/hotspot/share/code/codeCache.cpp line 1562: > >> 1560: } >> 1561: >> 1562: void CodeCache::write_perf_map(outputStream* st) { > > Could this whole function possibly live inside os/linux in an own file? Or would that expose too many code heap internals? Probably creates too much dependency between the os layer and the codeCache internals? I'll put it all in `#ifdef LINUX` though. ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From ngasson at openjdk.java.net Thu Oct 22 07:09:13 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 22 Oct 2020 07:09:13 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v2] In-Reply-To: References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: <-nFOBWmq0LmrYYr-PdeUN20eZ1PXk9XC7wTMyuJ7QEA=.bc5132a8-8a2c-400e-93ad-a940040750d7@github.com> On Wed, 21 Oct 2020 17:57:46 GMT, Chris Plummer wrote: >> I'm not sure, I didn't want to add too much `#ifdef` mess. The code will compile on other platforms, it just won't be called. Better to add `#ifdef`s around all of it? > > Any reason not to have this dcmd supported on all platforms even though the output is really targeted for use with the perf tool on linux? Would a user ever have any other use for the output other than with the perf tool on linux? @plummercj I'm not sure: it's just a text file so could be easily consumed by tools other than perf. But I'm not aware of any tools on other platforms that could use it. ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From ngasson at openjdk.java.net Thu Oct 22 07:12:23 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 22 Oct 2020 07:12:23 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v2] In-Reply-To: References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: On Thu, 22 Oct 2020 05:00:10 GMT, Thomas Stuefe wrote: > > 1. Like Alexey, I would really wish for an print-at-exit switch. The common naming seems to be xxxAtExit (so not, OnExit). "PrintXxx" seems to be printing stuff out to tty, "DumpXxxx" for writing separate files (e.g. CDS map). So I would name it DumpPerfMapAtExit. > OK, makes sense. > 2. Dumping to /tmp is unexpected for me, I would prefer if the default were dumping to the current directory. That seems to be the default for other files too (cds map, hs-err file etc). > > 3. Not necessary but nice would be a an option to specify location of the dump file. > The `/tmp/perf-.map` is hardcoded into perf though ([see here](https://github.com/torvalds/linux/blob/master/tools/perf/util/map.c#L155)), so I don't think it's useful for the user to be able to change the location. ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From xliu at openjdk.java.net Thu Oct 22 07:38:24 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 22 Oct 2020 07:38:24 GMT Subject: RFR: 8241495: Make more compiler related flags available on a per method level Message-ID: 8241495: Make more compiler related flags available on a per method level ------------- Commit messages: - 8241495: Make more compiler related flags available on a per method level Changes: https://git.openjdk.java.net/jdk/pull/796/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=796&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8241495 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/796.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/796/head:pull/796 PR: https://git.openjdk.java.net/jdk/pull/796 From xliu at openjdk.java.net Thu Oct 22 07:38:24 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 22 Oct 2020 07:38:24 GMT Subject: RFR: 8241495: Make more compiler related flags available on a per method level In-Reply-To: References: Message-ID: <7GqMpWDwb4FoQ2XxPw_tXDT3ev0T-onvviYAi4nH0Mk=.474e4c3b-164f-4f9e-9202-16eb2f4adacc@github.com> On Thu, 22 Oct 2020 07:28:01 GMT, Xin Liu wrote: > 8241495: Make more compiler related flags available on a per method level eg. -XX:CompileCommand=option,java.lang.String::startsWith,BreakAtCompile directs JIT compilers to hit BREAKPOINT when they compile the method java.lang.String::startsWith. ------------- PR: https://git.openjdk.java.net/jdk/pull/796 From github.com+890289+plokhotnyuk at openjdk.java.net Thu Oct 22 08:05:13 2020 From: github.com+890289+plokhotnyuk at openjdk.java.net (Andriy Plokhotnyuk) Date: Thu, 22 Oct 2020 08:05:13 GMT Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: On Wed, 21 Oct 2020 20:49:06 GMT, Vladimir Kozlov wrote: >> We have seen some specific benefits to increasing InlineSmallCode to 2500 from 2000, and across the whole promo build perf test collection the change is neutral to slightly positive, where the tests are run on modern OCI systems. >> >> Passed tier1 testing, some ad-hoc perf testing and more compiler related parts of the weekly promo performance set. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8254913 >> Thanks, >> Eric > > Good. I'm got ~10% regression after adding `-XX:InlineSmallCode=2500` in some JSON parsing benchmarks where huge methods were generated for some data structures with big number of fields. Steps to reproduce: 1. Clone the jsoniter-scala repo: git clone --depth 1 git at github.com:plokhotnyuk/jsoniter-scala.git 2. Run benchmarks with default options: sbt -java-home /usr/lib/jvm/openjdk-16 'jsoniter-scala-benchmarkJVM/jmh:run -wi 10 -i 10 TwitterAPIReading.jsoniterScala' 3. Run benchmarks with an additional `-XX:InlineSmallCode=2500` option: sbt -java-home /usr/lib/jvm/openjdk-16 'jsoniter-scala-benchmarkJVM/jmh:run -wi 10 -i 10 -jvmArgsAppend "-XX:InlineSmallCode=2500" TwitterAPIReading.jsoniterScala' ------------- PR: https://git.openjdk.java.net/jdk/pull/705 From lucy at openjdk.java.net Thu Oct 22 08:14:10 2020 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 22 Oct 2020 08:14:10 GMT Subject: RFR: 8255129: [PPC64, s390] Check vector_size_supported and add VectorReinterpret node In-Reply-To: References: Message-ID: On Wed, 21 Oct 2020 13:28:38 GMT, Martin Doerr wrote: > match_rule_supported_vector on PPC64 and s390 need to check vector_size_supported. > In addition, an implementation for VectorReinterpret is needed. It can get implemented empty when using src = dst register. Note that these 2 platforms support only one vector size at a time, so there's no need for move between different sizes. > > I'd like to clean up match_rule_supported, too. Cases for which we return true don't need to get checked explicitely because true is default. And we don't need to check SpecialString... flags because they are handled by "disabled_by_jvm_flags". So they can still get disabled (e.g. by jdk/bin/java -XX:-TieredCompilation -XX:-SpecialStringIndexOf -XX:+PrintCompilation -XX:+PrintInlining TestString|grep StringLatin1::indexOf). Martin, these changes look good to me. There remains a lot to be implemented, though. :-) And thank you for cleaning up the code! Best, Lutz ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/783 From rcastanedalo at openjdk.java.net Thu Oct 22 08:46:15 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 22 Oct 2020 08:46:15 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v2] In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Wed, 21 Oct 2020 17:49:24 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge master >> - Generalize the fix to handle any input where AddIs are used multiple times by >> other AddIs, which could also lead to an exponential number of calls to >> ConvI2LNode::Ideal(). This is achieved by (1) reusing existing ConvI2Ls if >> possible rather than eagerly creating new ones and (2) postponing the >> optimization of newly created ConvI2Ls. Remove "hook" node solution introduced >> in JDK-8217359 since this is subsumed by (2). Test that ConvI2LNode::Ideal() is >> called within iterative GVN using phase->is_IterGVN() rather than can_reshape, >> for clarity. >> >> Merge all tests into a single class. Reimplement the microbenchmark as a test >> case that should time out in case of a combinatorial explosion. Add a second >> similar microbenchmark that demonstrates the need for this generalization. >> - Merge master >> - 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially >> >> In the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)) within >> ConvI2LNode::Ideal(), handle the special case x = y by feeding both inputs of >> AddL from a single ConvI2L node rather than creating two semantically equivalent >> ConvI2L nodes. This avoids an exponential number of calls to >> ConvI2LNode::Ideal() when dealing with long chains of AddI nodes. Disable the >> optimization for the pattern ConvI2L(SubI(x, x)), which is simplified to zero >> during parsing anyway. Add a set of regression tests for the transformation that >> cover different shapes of AddI subgraphs. Also add a microbenchmark that >> exercises the special case, for performance regression testing. > > test/hotspot/jtreg/compiler/conversions/TestMoveConvI2LThroughAddIs.java line 41: > >> 39: * -XX:CompileOnly=::testChain,::testTree,::testDAG >> 40: * compiler.conversions.TestMoveConvI2LThroughAddIs basic >> 41: * @run main/othervm/timeout=1 -Xcomp -XX:-TieredCompilation -XX:-Inline > > By default timeout is 120 sec. Why you set it to 1 sec? > The same for second @run. These two runs (`stress1` and `stress2`) test that compilation time does not explode. When it does (before the proposed fix), the runs take > 10s to execute (but less than 120s, at least `stress1`), while after the fix both take just a few ms (all measurements from my local and our CI machines). I just deemed 1s to be a good value to balance the risk of false positives (reporting an nonexistent bug because the timeout is too short) and false negatives (not reporting a real bug because the timeout is too long). I will explore further increasing the load of these runs to see if we can increase the timeout without increasing the risk of false negatives. ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From rcastanedalo at openjdk.java.net Thu Oct 22 09:07:15 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 22 Oct 2020 09:07:15 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v2] In-Reply-To: <-j2ap4vnF04oKqLtf5JqGeQJ80PiuwOcbt6XOKGzg7k=.2b195e0e-21f2-425d-a0fe-8d92cd5e44d4@github.com> References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> <-j2ap4vnF04oKqLtf5JqGeQJ80PiuwOcbt6XOKGzg7k=.2b195e0e-21f2-425d-a0fe-8d92cd5e44d4@github.com> Message-ID: On Thu, 22 Oct 2020 03:32:48 GMT, eric.1iu wrote: >> test/hotspot/jtreg/compiler/conversions/TestMoveConvI2LThroughAddIs.java line 38: >> >>> 36: * the short specified timeout. >>> 37: * @library /test/lib / >>> 38: * @run main/othervm -Xcomp -XX:-TieredCompilation -XX:-Inline >> >> For future tests writing. >> In general we should avoid using -Xcomp because it does not provide profiling information to C2 and it may generate unexpected code because branches frequencies are not known. >> Preferable way is warm up test method by calling it in a loop enough times to make sure it is compiled and then verify results outside test method. >> Then you don't need -XX:CompileOnly command. > > Without -Xcomp, the final code would not the same as expected, lots of branches have been optimized with profiling data which we **do** normally used. > > I wonder whether it's feasible to create such a test case that grows exponentially with profiling data. I think that may decriable the issue more realistic. Thanks for the feedback! The goal of these test cases is to exercise the logic in `ConvI2LNode::Ideal()` in a way as isolated as possible from the rest of the JVM, for simplicity, reproducibility, and ease of debugging. I will rewrite them to verify the results outside the test method. I will see if I can get rid of `-Xcomp` and `-XX:CompileOnly` without making them too complex. If that does not work, I will try to construct another test case as suggested by @erik1iu . ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From neliasso at openjdk.java.net Thu Oct 22 09:20:13 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 22 Oct 2020 09:20:13 GMT Subject: RFR: 8241495: Make more compiler related flags available on a per method level In-Reply-To: References: Message-ID: <_CXcBL6MckmEHVOCz5R6laZGzyknrIo9Y6jQ8y_LP5o=.9abf2f24-4c93-4e38-8bdd-f0ec86f153d4@github.com> On Thu, 22 Oct 2020 07:28:01 GMT, Xin Liu wrote: > 8241495: Make more compiler related flags available on a per method level Changes requested by neliasso (Reviewer). src/hotspot/share/compiler/compilerDirectives.hpp line 37: > 35: // Directives flag name, type, default value, compile command name > 36: #define compilerdirectives_common_flags(cflags) \ > 37: cflags(Enable, bool, false, Enable) \ This flag only has a meaning when working with directives files. Doesn't it just add confusion to have a default from CompileCommand? src/hotspot/share/compiler/compilerDirectives.hpp line 39: > 37: cflags(Enable, bool, false, Enable) \ > 38: cflags(Exclude, bool, false, Exclude) \ > 39: cflags(BreakAtExecute, bool, false, BreakAtExecute) \ The BreakAtFlags are missing defaults since CompileCommand uses the "break" option. If we are going to add them to the directives - we should go through all uses and make sure that this is the only flag is uses. src/hotspot/share/compiler/compilerDirectives.hpp line 38: > 36: #define compilerdirectives_common_flags(cflags) \ > 37: cflags(Enable, bool, false, Enable) \ > 38: cflags(Exclude, bool, false, Exclude) \ CompileCommand uses a different semantic than CompilerDirectives to exclude methods from compilation. Are you sure you will be preserving backwards compatibility? See compilerDirectives.cpp lines 333-360 to see how the backwards compatibility is preserved. ------------- PR: https://git.openjdk.java.net/jdk/pull/796 From njian at openjdk.java.net Thu Oct 22 10:44:12 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 22 Oct 2020 10:44:12 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v9] In-Reply-To: References: Message-ID: <8Ryyxuf5P2D6WNyj4riYCTgN0U6WLrLpBmxhNbnmPpQ=.b2ed5660-99d0-49d1-83e0-8b2de518d7b8@github.com> On Wed, 21 Oct 2020 12:13:27 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 Thanks for the impressive work Jatin! I believe it will also be helpful for our Arm SVE work. I just took a quick look and have some questions. src/hotspot/share/opto/vectornode.cpp line 775: > 773: VectorMaskGenNode* make(int opc, Node* src, const Type* ty, const Type* ety) { > 774: return new VectorMaskGenNode(src, ty, ety); > 775: } These are not used? src/hotspot/share/opto/vectornode.hpp line 835: > 833: static VectorMaskGenNode* make(int opc, Node* src, const Type* ty, const Type* ety); > 834: private: > 835: const Type* _elemType; Will an additional field in the node valid after some optimizations, i.e. clone()? I think I know the ety, but I don't know the usage of ty. If so, do you need to have a new type like what TypeVect does for mask? src/hotspot/share/opto/vectornode.hpp line 826: > 824: class VectorMaskGenNode : public TypeNode { > 825: public: > 826: VectorMaskGenNode(Node* src, const Type* ty, const Type* ety): TypeNode(ty, 2), _elemType(ety) { Sorry, I don't quite understand the arguments here. What does 'src' mean to the mask? ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From jiefu at openjdk.java.net Thu Oct 22 11:14:11 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 22 Oct 2020 11:14:11 GMT Subject: RFR: 8255210: [Vector API] jdk/incubator/vector/Int256VectorTests.java crashes on AVX512 machines In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 03:12:31 GMT, Jie Fu wrote: > Hi all, > > Please review the fix of an AVX512 crash for Vector API. > The reason is that reductionI in x86.ad didn't use legVec for code generation, which is required by Assembler::vphaddd. > > Testing: > - test/jdk/jdk/incubator/vector all passed on both AVX256 and AVX512 machines > > Thanks. > Best regards, > Jie > Hi @DamonFool , similar problem also exists for reduction patterns for other primitive types (Long/Float/Double). Hi @jatin-bhateja , Thanks for your reminder. I've also noticed these problems. But I don't have a reproducer right now. And I need some time to construct one. So I plan to file another bug to fix them when the reproducer is ready. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/791 From redestad at openjdk.java.net Thu Oct 22 12:42:18 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 22 Oct 2020 12:42:18 GMT Subject: RFR: 8255049: Remove support for the hsdis decode_instructions entry point in hotspot Message-ID: This patch drops support in hotspot for hsdis plugins which only include the old decode_instructions endpoint. The decode_instructions entry point in hsdis was replaced by decode_instructions_virtual[1], with support later added to allow old hsdis plugins to work, at least for the duration of JDK 8. Dropping the backwards compatibility means you'll need a hsdis built from JDK 8 sources or later, which seems a reasonable requirement at this point. It's unclear if a CSR request is needed. [1] https://github.com/openjdk/jdk/commit/22544e7a7c72e8779355df963e49e846f9458ce4 [2] https://github.com/openjdk/jdk/commit/a9c40e9df41ee06adcd7fff951dd36b6c093a24b ------------- Commit messages: - Remove unused declarations - Merge branch 'master' into cleanup_hsdis_entrypoints - Remove _decode_instructions support Changes: https://git.openjdk.java.net/jdk/pull/807/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=807&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255049 Stats: 39 lines in 2 files changed: 0 ins; 31 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/807.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/807/head:pull/807 PR: https://git.openjdk.java.net/jdk/pull/807 From neliasso at openjdk.java.net Thu Oct 22 13:23:12 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 22 Oct 2020 13:23:12 GMT Subject: RFR: 8255049: Remove support for the hsdis decode_instructions entry point in hotspot In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 12:33:44 GMT, Claes Redestad wrote: > This patch drops support in hotspot for hsdis plugins which only include the old decode_instructions endpoint. > > The decode_instructions entry point in hsdis was replaced by decode_instructions_virtual[1], with support later added to allow old hsdis plugins to work, at least for the duration of JDK 8. Dropping the backwards compatibility means you'll need a hsdis built from JDK 8 sources or later, which seems a reasonable requirement at this point. > > It's unclear if a CSR request is needed. > > [1] https://github.com/openjdk/jdk/commit/22544e7a7c72e8779355df963e49e846f9458ce4 > [2] https://github.com/openjdk/jdk/commit/a9c40e9df41ee06adcd7fff951dd36b6c093a24b Nice clean up. Reviewed. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/807 From martin.doerr at sap.com Thu Oct 22 13:23:55 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 22 Oct 2020 13:23:55 +0000 Subject: Vector API: stack overflow on Big Endian In-Reply-To: <08399669-DCB9-4CB4-A23F-A6B64439D1DA@oracle.com> References: <08399669-DCB9-4CB4-A23F-A6B64439D1DA@oracle.com> Message-ID: Hi Paul, thanks a lot for your help. Your fix is working fine. With that, I can only see one test failing: VectorReshapeTests Failing with Species[byte, 16, S_128_BIT]->Species[short, 8, S_128_BIT] (lanewise), partLimit=2, block=8, part=0, origin=0 Failing with Species[byte, 32, S_256_BIT]->Species[short, 16, S_256_BIT] (lanewise), partLimit=2, block=16, part=0, origin=0 Failing with Species[byte, 64, S_512_BIT]->Species[short, 32, S_512_BIT] (lanewise), partLimit=2, block=32, part=0, origin=0 Failing with Species[byte, 8, S_64_BIT]->Species[short, 4, S_64_BIT] (lanewise), partLimit=2, block=4, part=0, origin=0 Failing with Species[byte, 8, S_Max_BIT]->Species[short, 4, S_Max_BIT] (lanewise), partLimit=2, block=4, part=0, origin=0 All of them fail because pairs are swapped like this: expect: [1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0] output: [0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8] Do you know if that's another endianness problem? Best regards, Martin > -----Original Message----- > From: Paul Sandoz > Sent: Donnerstag, 22. Oktober 2020 02:27 > To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: Vector API: stack overflow on Big Endian > > Hi Martin, > > We definitely have far less exposure by default to BE platforms now that > SPARC is not a thing, it's easy to unintentionally hardcode a bias to LE. > > Would it be possible to try running the tests on your BE platforms with the > following modification? > > --- > a/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVect > or.java > +++ > b/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVect > or.java > @@ -500,7 +500,7 @@ abstract class AbstractVector extends Vector { > final > AbstractVector defaultReinterpret(AbstractSpecies rsp) { > int blen = Math.max(this.bitSize(), rsp.vectorBitSize()) / Byte.SIZE; > - ByteOrder bo = ByteOrder.LITTLE_ENDIAN; > + ByteOrder bo = ByteOrder.nativeOrder();//LITTLE_ENDIAN; > ByteBuffer bb = ByteBuffer.allocate(blen); > this.intoByteBuffer(bb, 0, bo); > VectorMask m = rsp.maskAll(true); > > Paul. > > > > On Oct 21, 2020, at 4:10 AM, Doerr, Martin > wrote: > > > > Hi, > > > > I noticed stack overflow issues because of endless recursion on Big Endian > platforms (s390 and PPC64) in several tests: > > > > E.g. Int64VectorLoadStoreTests: > > > > at > jdk.incubator.vector/jdk.incubator.vector.IntVector.maybeSwap(IntVector.j > ava:3330) > > at > jdk.incubator.vector/jdk.incubator.vector.IntVector.intoByteBuffer(IntVecto > r.java:3151) > > at > jdk.incubator.vector/jdk.incubator.vector.AbstractVector.defaultReinterpret > (AbstractVector.java:505) > > at > java.base/jdk.internal.vm.vector.VectorSupport.convert(VectorSupport.java > :441) > > at > jdk.incubator.vector/jdk.incubator.vector.AbstractVector.convert0(Abstract > Vector.java:686) > > at > jdk.incubator.vector/jdk.incubator.vector.AbstractVector.asVectorRawTemp > late(AbstractVector.java:173) > > at > jdk.incubator.vector/jdk.incubator.vector.AbstractVector.asByteVectorRawT > emplate(AbstractVector.java:179) > > at > jdk.incubator.vector/jdk.incubator.vector.Int64Vector.asByteVectorRaw(Int > 64Vector.java:177) > > at > jdk.incubator.vector/jdk.incubator.vector.Int64Vector.asByteVectorRaw(Int > 64Vector.java:41) > > at > jdk.incubator.vector/jdk.incubator.vector.IntVector.reinterpretAsBytes(IntV > ector.java:3366) > > at > jdk.incubator.vector/jdk.incubator.vector.IntVector.maybeSwap(IntVector.j > ava:3330) > > > > Note that maybeSwap is endianness sensitive: > > IntVector maybeSwap(ByteOrder bo) { > > if (bo != NATIVE_ENDIAN) { > > return this.reinterpretAsBytes() > > .rearrange(swapBytesShuffle()) > > .reinterpretAsInts(); > > } > > return this; > > } > > > > How is this supposed to work? > > Is anything platform specific missing? > > > > Thanks and best regards, > > Martin > > From hohensee at amazon.com Thu Oct 22 13:28:05 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Thu, 22 Oct 2020 13:28:05 +0000 Subject: 11u] RFR 8254790: SIGSEGV in string_indexof_char and stringL_indexof_char intrinsics Message-ID: Looks good. Paul ?On 10/21/20, 3:40 PM, "jdk-updates-dev on behalf of Viswanathan, Sandhya" wrote: The fix for 8254790 needs to be backported to JDK11u. The fix is one-line change in string_indexof_char intrinsic. The intrinsic is moved to c2_MacroAssembler_x86.cpp since jdk15 and was in macroAssembler_x86.cpp in JDK 11u so I am sending a separate webrev review request. The other stringL_indexof_char intrinsic didn't exist in JDK11u so that fix is omitted in this webrev. JBS: https://bugs.openjdk.java.net/browse/JDK-8254790 Webrev: http://cr.openjdk.java.net/~sviswanathan/8254790/webrev.00/ Best Regards, Sandhya From roland at openjdk.java.net Thu Oct 22 15:11:15 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 22 Oct 2020 15:11:15 GMT Subject: RFR: 8255224: x86_32 tests fail with "bad AD file" after JDK-8223051 Message-ID: x86_32 is missing an ad file rule for (CMoveL (Bool (CmpUL ------------- Commit messages: - missing ad file match rule Changes: https://git.openjdk.java.net/jdk/pull/811/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=811&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255224 Stats: 22 lines in 1 file changed: 22 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/811.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/811/head:pull/811 PR: https://git.openjdk.java.net/jdk/pull/811 From rcastanedalo at openjdk.java.net Thu Oct 22 16:33:32 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 22 Oct 2020 16:33:32 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v3] In-Reply-To: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: <2zse5jr98dv4fKliUBNS-TKg1-XPe5zwBeJMhPXN3Ic=.aa4a0672-f863-4358-a8f8-3237b21575dc@github.com> > Prevent exponential number of calls to `ConvI2LNode::Ideal()` when AddIs are used multiple times by other AddIs in the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)). This is achieved by (1) reusing existing ConvI2Ls if possible rather than eagerly creating new ones and (2) postponing the optimization of newly created ConvI2Ls. Remove hook node solution introduced in [8217359](https://github.com/openjdk/jdk/commit/cf554816d1952f722143e9d03ec669e80f955adf), since this is subsumed by (2). Use `phase->is_IterGVN()` rather than `can_reshape` to check if `ConvI2LNode::Ideal()` is called within iterative GVN, for clarity. Add regression tests that cover different shapes and sizes of AddI subgraphs, implicitly checking (by not timing out) that there is no combinatorial explosion. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Update tests Simplify JVM arguments and run each test case 100000 times to still trigger C2. Use randomization to avoid constant propagation in C2. Increase the load of the stress tests and their timeout to 30s to further reduce the risk of false positives. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/727/files - new: https://git.openjdk.java.net/jdk/pull/727/files/b5cf7aab..d5747965 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=727&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=727&range=01-02 Stats: 93 lines in 1 file changed: 33 ins; 7 del; 53 mod Patch: https://git.openjdk.java.net/jdk/pull/727.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/727/head:pull/727 PR: https://git.openjdk.java.net/jdk/pull/727 From rcastanedalo at openjdk.java.net Thu Oct 22 16:33:33 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 22 Oct 2020 16:33:33 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v2] In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> Message-ID: On Thu, 22 Oct 2020 08:43:09 GMT, Roberto Casta?eda Lozano wrote: >> test/hotspot/jtreg/compiler/conversions/TestMoveConvI2LThroughAddIs.java line 41: >> >>> 39: * -XX:CompileOnly=::testChain,::testTree,::testDAG >>> 40: * compiler.conversions.TestMoveConvI2LThroughAddIs basic >>> 41: * @run main/othervm/timeout=1 -Xcomp -XX:-TieredCompilation -XX:-Inline >> >> By default timeout is 120 sec. Why you set it to 1 sec? >> The same for second @run. > > These two runs (`stress1` and `stress2`) test that compilation time does not explode. When it does (before the proposed fix), the runs take > 10s to execute (but less than 120s, at least `stress1`), while after the fix both take just a few ms (all measurements from my local and our CI machines). I just deemed 1s to be a good value to balance the risk of false positives (reporting an nonexistent bug because the timeout is too short) and false negatives (not reporting a real bug because the timeout is too long). I will explore further increasing the load of these runs to see if we can increase the timeout without increasing the risk of false negatives. I have addressed this in the latest update. ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From rcastanedalo at openjdk.java.net Thu Oct 22 16:33:33 2020 From: rcastanedalo at openjdk.java.net (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 22 Oct 2020 16:33:33 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v2] In-Reply-To: References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> <-j2ap4vnF04oKqLtf5JqGeQJ80PiuwOcbt6XOKGzg7k=.2b195e0e-21f2-425d-a0fe-8d92cd5e44d4@github.com> Message-ID: On Thu, 22 Oct 2020 09:03:33 GMT, Roberto Casta?eda Lozano wrote: >> Without -Xcomp, the final code would not the same as expected, lots of branches have been optimized with profiling data which we **do** normally used. >> >> I wonder whether it's feasible to create such a test case that grows exponentially with profiling data. I think that may decriable the issue more realistic. > > Thanks for the feedback! The goal of these test cases is to exercise the logic in `ConvI2LNode::Ideal()` in a way as isolated as possible from the rest of the JVM, for simplicity, reproducibility, and ease of debugging. I will rewrite them to verify the results outside the test method. I will see if I can get rid of `-Xcomp` and `-XX:CompileOnly` without making them too complex. If that does not work, I will try to construct another test case as suggested by @erik1iu . I have addressed this in the latest update. ------------- PR: https://git.openjdk.java.net/jdk/pull/727 From psandoz at openjdk.java.net Thu Oct 22 16:51:11 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 22 Oct 2020 16:51:11 GMT Subject: RFR: 8255210: [Vector API] jdk/incubator/vector/Int256VectorTests.java crashes on AVX512 machines In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 03:12:31 GMT, Jie Fu wrote: > Hi all, > > Please review the fix of an AVX512 crash for Vector API. > The reason is that reductionI in x86.ad didn't use legVec for code generation, which is required by Assembler::vphaddd. > > Testing: > - test/jdk/jdk/incubator/vector all passed on both AVX256 and AVX512 machines > > Thanks. > Best regards, > Jie LGTM based on same changes made previously for short vectors. Good if Sandhya (@sviswa7) et. al. can review also. ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/791 From shade at openjdk.java.net Thu Oct 22 16:58:11 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 22 Oct 2020 16:58:11 GMT Subject: RFR: 8255224: x86_32 tests fail with "bad AD file" after JDK-8223051 In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 15:05:23 GMT, Roland Westrelin wrote: > x86_32 is missing an ad file rule for (CMoveL (Bool (CmpUL So these are the copies of `cmovLL_reg_LEGT` and `cmovLL_mem_LEGT` matches above, but with `flagsReg_ulong_LEGT flags`? Looks fine to me. It passes almost all tests in tier1 for me (other failures look unrelated). ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/811 From kvn at openjdk.java.net Thu Oct 22 17:06:11 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 22 Oct 2020 17:06:11 GMT Subject: RFR: 8254317: C2: Resource consumption of ConvI2LNode::Ideal() grows exponentially [v3] In-Reply-To: <2zse5jr98dv4fKliUBNS-TKg1-XPe5zwBeJMhPXN3Ic=.aa4a0672-f863-4358-a8f8-3237b21575dc@github.com> References: <9FJMNUDr4xvtcIlGtEk2Y_7tA17Us29ImExhFpzs87s=.66c146c6-7629-4e1b-a62b-d68714636f32@github.com> <2zse5jr98dv4fKliUBNS-TKg1-XPe5zwBeJMhPXN3Ic=.aa4a0672-f863-4358-a8f8-3237b21575dc@github.com> Message-ID: On Thu, 22 Oct 2020 16:33:32 GMT, Roberto Casta?eda Lozano wrote: >> Prevent exponential number of calls to `ConvI2LNode::Ideal()` when AddIs are used multiple times by other AddIs in the optimization ConvI2L(AddI(x, y)) -> AddL(ConvI2L(x), ConvI2L(y)). This is achieved by (1) reusing existing ConvI2Ls if possible rather than eagerly creating new ones and (2) postponing the optimization of newly created ConvI2Ls. Remove hook node solution introduced in [8217359](https://github.com/openjdk/jdk/commit/cf554816d1952f722143e9d03ec669e80f955adf), since this is subsumed by (2). Use `phase->is_IterGVN()` rather than `can_reshape` to check if `ConvI2LNode::Ideal()` is called within iterative GVN, for clarity. Add regression tests that cover different shapes and sizes of AddI subgraphs, implicitly checking (by not timing out) that there is no combinatorial explosion. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Update tests > > Simplify JVM arguments and run each test case 100000 times to still trigger > C2. Use randomization to avoid constant propagation in C2. Increase the load of > the stress tests and their timeout to 30s to further reduce the risk of false > positives. Good. Thanks! ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/727 From kvn at openjdk.java.net Thu Oct 22 17:09:16 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 22 Oct 2020 17:09:16 GMT Subject: RFR: 8255224: x86_32 tests fail with "bad AD file" after JDK-8223051 In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 15:05:23 GMT, Roland Westrelin wrote: > x86_32 is missing an ad file rule for (CMoveL (Bool (CmpUL Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/811 From jjg at openjdk.java.net Thu Oct 22 17:29:21 2020 From: jjg at openjdk.java.net (Jonathan Gibbons) Date: Thu, 22 Oct 2020 17:29:21 GMT Subject: RFR: JDK-8255262: Remove use of legacy custom @spec tag Message-ID: The change is (just) to remove legacy usages of a JDK-private custom tag. ------------- Commit messages: - JDK-8255262: Remove use of legacy custom @spec tag Changes: https://git.openjdk.java.net/jdk/pull/814/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=814&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255262 Stats: 209 lines in 69 files changed: 0 ins; 209 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/814.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/814/head:pull/814 PR: https://git.openjdk.java.net/jdk/pull/814 From lancea at openjdk.java.net Thu Oct 22 17:35:15 2020 From: lancea at openjdk.java.net (Lance Andersen) Date: Thu, 22 Oct 2020 17:35:15 GMT Subject: RFR: JDK-8255262: Remove use of legacy custom @spec tag In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 17:16:23 GMT, Jonathan Gibbons wrote: > The change is (just) to remove legacy usages of a JDK-private custom tag. looks fine ------------- Marked as reviewed by lancea (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/814 From iris at openjdk.java.net Thu Oct 22 17:46:16 2020 From: iris at openjdk.java.net (Iris Clark) Date: Thu, 22 Oct 2020 17:46:16 GMT Subject: RFR: JDK-8255262: Remove use of legacy custom @spec tag In-Reply-To: References: Message-ID: <-NCS5lF0mdeaDq1CyQYTrh0gGVozPafltjNYsEpO488=.d795b2ec-acc2-4c06-8434-757b2095e386@github.com> On Thu, 22 Oct 2020 17:16:23 GMT, Jonathan Gibbons wrote: > The change is (just) to remove legacy usages of a JDK-private custom tag. Nice clean-up. ------------- Marked as reviewed by iris (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/814 From mr at openjdk.java.net Thu Oct 22 17:46:15 2020 From: mr at openjdk.java.net (Mark Reinhold) Date: Thu, 22 Oct 2020 17:46:15 GMT Subject: RFR: JDK-8255262: Remove use of legacy custom @spec tag In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 17:16:23 GMT, Jonathan Gibbons wrote: > The change is (just) to remove legacy usages of a JDK-private custom tag. As the creator of these tags many moons ago, I approve this change. ------------- Marked as reviewed by mr (Lead). PR: https://git.openjdk.java.net/jdk/pull/814 From alanb at openjdk.java.net Thu Oct 22 17:46:17 2020 From: alanb at openjdk.java.net (Alan Bateman) Date: Thu, 22 Oct 2020 17:46:17 GMT Subject: RFR: JDK-8255262: Remove use of legacy custom @spec tag In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 17:16:23 GMT, Jonathan Gibbons wrote: > The change is (just) to remove legacy usages of a JDK-private custom tag. Marked as reviewed by alanb (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/814 From darcy at openjdk.java.net Thu Oct 22 17:46:18 2020 From: darcy at openjdk.java.net (Joe Darcy) Date: Thu, 22 Oct 2020 17:46:18 GMT Subject: RFR: JDK-8255262: Remove use of legacy custom @spec tag In-Reply-To: References: Message-ID: <5hlum1g6yK7pF3TkDMBvYSlK0Ca3bVh1VM3XJUVAvCk=.93e60ef2-400c-4775-a4c5-1f290e14ed57@github.com> On Thu, 22 Oct 2020 17:16:23 GMT, Jonathan Gibbons wrote: > The change is (just) to remove legacy usages of a JDK-private custom tag. Marked as reviewed by darcy (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/814 From kvn at openjdk.java.net Thu Oct 22 17:47:14 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 22 Oct 2020 17:47:14 GMT Subject: RFR: 8255049: Remove support for the hsdis decode_instructions entry point in hotspot In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 12:33:44 GMT, Claes Redestad wrote: > This patch drops support in hotspot for hsdis plugins which only include the old decode_instructions endpoint. > > The decode_instructions entry point in hsdis was replaced by decode_instructions_virtual[1], with support later added to allow old hsdis plugins to work, at least for the duration of JDK 8. Dropping the backwards compatibility means you'll need a hsdis built from JDK 8 sources or later, which seems a reasonable requirement at this point. > > It's unclear if a CSR request is needed. > > [1] https://github.com/openjdk/jdk/commit/22544e7a7c72e8779355df963e49e846f9458ce4 > [2] https://github.com/openjdk/jdk/commit/a9c40e9df41ee06adcd7fff951dd36b6c093a24b Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/807 From mchung at openjdk.java.net Thu Oct 22 17:56:16 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Thu, 22 Oct 2020 17:56:16 GMT Subject: RFR: JDK-8255262: Remove use of legacy custom @spec tag In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 17:16:23 GMT, Jonathan Gibbons wrote: > The change is (just) to remove legacy usages of a JDK-private custom tag. Marked as reviewed by mchung (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/814 From paul.sandoz at oracle.com Thu Oct 22 18:11:11 2020 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 22 Oct 2020 11:11:11 -0700 Subject: Vector API: stack overflow on Big Endian In-Reply-To: References: <08399669-DCB9-4CB4-A23F-A6B64439D1DA@oracle.com> Message-ID: Hi Martin, Can you provide more details on what test methods are failing? I would like to know the scope of the test failures and more precisely where they occur. My suspicion is there by be a problem with the test, specifically with the method castByteArrayData. Paul. > On Oct 22, 2020, at 6:23 AM, Doerr, Martin wrote: > > Hi Paul, > > thanks a lot for your help. Your fix is working fine. > > With that, I can only see one test failing: > VectorReshapeTests > > Failing with Species[byte, 16, S_128_BIT]->Species[short, 8, S_128_BIT] (lanewise), partLimit=2, block=8, part=0, origin=0 > Failing with Species[byte, 32, S_256_BIT]->Species[short, 16, S_256_BIT] (lanewise), partLimit=2, block=16, part=0, origin=0 > Failing with Species[byte, 64, S_512_BIT]->Species[short, 32, S_512_BIT] (lanewise), partLimit=2, block=32, part=0, origin=0 > Failing with Species[byte, 8, S_64_BIT]->Species[short, 4, S_64_BIT] (lanewise), partLimit=2, block=4, part=0, origin=0 > Failing with Species[byte, 8, S_Max_BIT]->Species[short, 4, S_Max_BIT] (lanewise), partLimit=2, block=4, part=0, origin=0 > > All of them fail because pairs are swapped like this: > expect: [1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0] > output: [0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8] > > Do you know if that's another endianness problem? > > Best regards, > Martin > > >> -----Original Message----- >> From: Paul Sandoz >> Sent: Donnerstag, 22. Oktober 2020 02:27 >> To: Doerr, Martin >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: Vector API: stack overflow on Big Endian >> >> Hi Martin, >> >> We definitely have far less exposure by default to BE platforms now that >> SPARC is not a thing, it's easy to unintentionally hardcode a bias to LE. >> >> Would it be possible to try running the tests on your BE platforms with the >> following modification? >> >> --- >> a/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVect >> or.java >> +++ >> b/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVect >> or.java >> @@ -500,7 +500,7 @@ abstract class AbstractVector extends Vector { >> final >> AbstractVector defaultReinterpret(AbstractSpecies rsp) { >> int blen = Math.max(this.bitSize(), rsp.vectorBitSize()) / Byte.SIZE; >> - ByteOrder bo = ByteOrder.LITTLE_ENDIAN; >> + ByteOrder bo = ByteOrder.nativeOrder();//LITTLE_ENDIAN; >> ByteBuffer bb = ByteBuffer.allocate(blen); >> this.intoByteBuffer(bb, 0, bo); >> VectorMask m = rsp.maskAll(true); >> >> Paul. >> >> >>> On Oct 21, 2020, at 4:10 AM, Doerr, Martin >> wrote: >>> >>> Hi, >>> >>> I noticed stack overflow issues because of endless recursion on Big Endian >> platforms (s390 and PPC64) in several tests: >>> >>> E.g. Int64VectorLoadStoreTests: >>> >>> at >> jdk.incubator.vector/jdk.incubator.vector.IntVector.maybeSwap(IntVector.j >> ava:3330) >>> at >> jdk.incubator.vector/jdk.incubator.vector.IntVector.intoByteBuffer(IntVecto >> r.java:3151) >>> at >> jdk.incubator.vector/jdk.incubator.vector.AbstractVector.defaultReinterpret >> (AbstractVector.java:505) >>> at >> java.base/jdk.internal.vm.vector.VectorSupport.convert(VectorSupport.java >> :441) >>> at >> jdk.incubator.vector/jdk.incubator.vector.AbstractVector.convert0(Abstract >> Vector.java:686) >>> at >> jdk.incubator.vector/jdk.incubator.vector.AbstractVector.asVectorRawTemp >> late(AbstractVector.java:173) >>> at >> jdk.incubator.vector/jdk.incubator.vector.AbstractVector.asByteVectorRawT >> emplate(AbstractVector.java:179) >>> at >> jdk.incubator.vector/jdk.incubator.vector.Int64Vector.asByteVectorRaw(Int >> 64Vector.java:177) >>> at >> jdk.incubator.vector/jdk.incubator.vector.Int64Vector.asByteVectorRaw(Int >> 64Vector.java:41) >>> at >> jdk.incubator.vector/jdk.incubator.vector.IntVector.reinterpretAsBytes(IntV >> ector.java:3366) >>> at >> jdk.incubator.vector/jdk.incubator.vector.IntVector.maybeSwap(IntVector.j >> ava:3330) >>> >>> Note that maybeSwap is endianness sensitive: >>> IntVector maybeSwap(ByteOrder bo) { >>> if (bo != NATIVE_ENDIAN) { >>> return this.reinterpretAsBytes() >>> .rearrange(swapBytesShuffle()) >>> .reinterpretAsInts(); >>> } >>> return this; >>> } >>> >>> How is this supposed to work? >>> Is anything platform specific missing? >>> >>> Thanks and best regards, >>> Martin >>> > From martin.doerr at sap.com Thu Oct 22 18:36:17 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 22 Oct 2020 18:36:17 +0000 Subject: Vector API: stack overflow on Big Endian In-Reply-To: References: <08399669-DCB9-4CB4-A23F-A6B64439D1DA@oracle.com> Message-ID: Hi Paul, here's the snippet from the JTwork/jdk/incubator/vector/VectorReshapeTests.jtr from a run on an s390 machine. Let me know if you need anything else. Thanks and best regards, Martin [TestNG] Running: jdk/incubator/vector/VectorReshapeTests.java test VectorReshapeTests.testCastFromByte(byte(i)): success test VectorReshapeTests.testCastFromDouble(double(i)): success test VectorReshapeTests.testCastFromFloat(float(i)): success test VectorReshapeTests.testCastFromInt(int(i)): success test VectorReshapeTests.testCastFromLong(long(i)): success test VectorReshapeTests.testCastFromShort(short(i)): success input: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16] Failing with Species[byte, 16, S_128_BIT]->Species[short, 8, S_128_BIT] (lanewise), partLimit=2, block=8, part=0, origin=0 expect: [1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0] output: [0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8] test VectorReshapeTests.testRebracket128(byte(i)): failure java.lang.AssertionError: arrays differ firstly at element [0]; expected value is <0> but was <1>. at org.testng.Assert.fail(Assert.java:94) at org.testng.Assert.assertEquals(Assert.java:774) at org.testng.Assert.assertEquals(Assert.java:748) at VectorReshapeTests.checkPartialResult(VectorReshapeTests.java:419) at VectorReshapeTests.testVectorRebracket(VectorReshapeTests.java:746) at VectorReshapeTests.testVectorRebracketLanewise(VectorReshapeTests.java:706) at VectorReshapeTests.testVectorRebracket(VectorReshapeTests.java:701) at VectorReshapeTests.testRebracket128(VectorReshapeTests.java:875) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:85) at org.testng.internal.Invoker.invokeMethod(Invoker.java:639) at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:821) at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1131) at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108) at org.testng.TestRunner.privateRun(TestRunner.java:773) at org.testng.TestRunner.run(TestRunner.java:623) at org.testng.SuiteRunner.runTest(SuiteRunner.java:357) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:352) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:310) at org.testng.SuiteRunner.run(SuiteRunner.java:259) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1185) at org.testng.TestNG.runSuitesLocally(TestNG.java:1110) at org.testng.TestNG.run(TestNG.java:1018) at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:94) at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:54) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:832) input: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32] Failing with Species[byte, 32, S_256_BIT]->Species[short, 16, S_256_BIT] (lanewise), partLimit=2, block=16, part=0, origin=0 expect: [1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0, 10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0] output: [0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0, 10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16] test VectorReshapeTests.testRebracket256(byte(i)): failure java.lang.AssertionError: arrays differ firstly at element [0]; expected value is <0> but was <1>. at org.testng.Assert.fail(Assert.java:94) at org.testng.Assert.assertEquals(Assert.java:774) at org.testng.Assert.assertEquals(Assert.java:748) at VectorReshapeTests.checkPartialResult(VectorReshapeTests.java:419) at VectorReshapeTests.testVectorRebracket(VectorReshapeTests.java:746) at VectorReshapeTests.testVectorRebracketLanewise(VectorReshapeTests.java:706) at VectorReshapeTests.testVectorRebracket(VectorReshapeTests.java:701) at VectorReshapeTests.testRebracket256(VectorReshapeTests.java:924) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:85) at org.testng.internal.Invoker.invokeMethod(Invoker.java:639) at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:821) at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1131) at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108) at org.testng.TestRunner.privateRun(TestRunner.java:773) at org.testng.TestRunner.run(TestRunner.java:623) at org.testng.SuiteRunner.runTest(SuiteRunner.java:357) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:352) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:310) at org.testng.SuiteRunner.run(SuiteRunner.java:259) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1185) at org.testng.TestNG.runSuitesLocally(TestNG.java:1110) at org.testng.TestNG.run(TestNG.java:1018) at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:94) at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:54) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:832) input: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64] Failing with Species[byte, 64, S_512_BIT]->Species[short, 32, S_512_BIT] (lanewise), partLimit=2, block=32, part=0, origin=0 expect: [1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0, 10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0, 17, 0, 18, 0, 19, 0, 20, 0, 21, 0, 22, 0, 23, 0, 24, 0, 25, 0, 26, 0, 27, 0, 28, 0, 29, 0, 30, 0, 31, 0, 32, 0] output: [0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0, 10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0, 17, 0, 18, 0, 19, 0, 20, 0, 21, 0, 22, 0, 23, 0, 24, 0, 25, 0, 26, 0, 27, 0, 28, 0, 29, 0, 30, 0, 31, 0, 32] test VectorReshapeTests.testRebracket512(byte(i)): failure java.lang.AssertionError: arrays differ firstly at element [0]; expected value is <0> but was <1>. at org.testng.Assert.fail(Assert.java:94) at org.testng.Assert.assertEquals(Assert.java:774) at org.testng.Assert.assertEquals(Assert.java:748) at VectorReshapeTests.checkPartialResult(VectorReshapeTests.java:419) at VectorReshapeTests.testVectorRebracket(VectorReshapeTests.java:746) at VectorReshapeTests.testVectorRebracketLanewise(VectorReshapeTests.java:706) at VectorReshapeTests.testVectorRebracket(VectorReshapeTests.java:701) at VectorReshapeTests.testRebracket512(VectorReshapeTests.java:973) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:85) at org.testng.internal.Invoker.invokeMethod(Invoker.java:639) at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:821) at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1131) at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108) at org.testng.TestRunner.privateRun(TestRunner.java:773) at org.testng.TestRunner.run(TestRunner.java:623) at org.testng.SuiteRunner.runTest(SuiteRunner.java:357) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:352) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:310) at org.testng.SuiteRunner.run(SuiteRunner.java:259) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1185) at org.testng.TestNG.runSuitesLocally(TestNG.java:1110) at org.testng.TestNG.run(TestNG.java:1018) at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:94) at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:54) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:832) input: [1, 2, 3, 4, 5, 6, 7, 8] Failing with Species[byte, 8, S_64_BIT]->Species[short, 4, S_64_BIT] (lanewise), partLimit=2, block=4, part=0, origin=0 expect: [1, 0, 2, 0, 3, 0, 4, 0] output: [0, 1, 0, 2, 0, 3, 0, 4] test VectorReshapeTests.testRebracket64(byte(i)): failure java.lang.AssertionError: arrays differ firstly at element [0]; expected value is <0> but was <1>. at org.testng.Assert.fail(Assert.java:94) at org.testng.Assert.assertEquals(Assert.java:774) at org.testng.Assert.assertEquals(Assert.java:748) at VectorReshapeTests.checkPartialResult(VectorReshapeTests.java:419) at VectorReshapeTests.testVectorRebracket(VectorReshapeTests.java:746) at VectorReshapeTests.testVectorRebracketLanewise(VectorReshapeTests.java:706) at VectorReshapeTests.testVectorRebracket(VectorReshapeTests.java:701) at VectorReshapeTests.testRebracket64(VectorReshapeTests.java:826) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:85) at org.testng.internal.Invoker.invokeMethod(Invoker.java:639) at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:821) at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1131) at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108) at org.testng.TestRunner.privateRun(TestRunner.java:773) at org.testng.TestRunner.run(TestRunner.java:623) at org.testng.SuiteRunner.runTest(SuiteRunner.java:357) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:352) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:310) at org.testng.SuiteRunner.run(SuiteRunner.java:259) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1185) at org.testng.TestNG.runSuitesLocally(TestNG.java:1110) at org.testng.TestNG.run(TestNG.java:1018) at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:94) at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:54) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:832) input: [1, 2, 3, 4, 5, 6, 7, 8] Failing with Species[byte, 8, S_Max_BIT]->Species[short, 4, S_Max_BIT] (lanewise), partLimit=2, block=4, part=0, origin=0 expect: [1, 0, 2, 0, 3, 0, 4, 0] output: [0, 1, 0, 2, 0, 3, 0, 4] test VectorReshapeTests.testRebracketMax(byte(i)): failure java.lang.AssertionError: arrays differ firstly at element [0]; expected value is <0> but was <1>. at org.testng.Assert.fail(Assert.java:94) at org.testng.Assert.assertEquals(Assert.java:774) at org.testng.Assert.assertEquals(Assert.java:748) at VectorReshapeTests.checkPartialResult(VectorReshapeTests.java:419) at VectorReshapeTests.testVectorRebracket(VectorReshapeTests.java:746) at VectorReshapeTests.testVectorRebracketLanewise(VectorReshapeTests.java:706) at VectorReshapeTests.testVectorRebracket(VectorReshapeTests.java:701) at VectorReshapeTests.testRebracketMax(VectorReshapeTests.java:1022) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:85) at org.testng.internal.Invoker.invokeMethod(Invoker.java:639) at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:821) at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1131) at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108) at org.testng.TestRunner.privateRun(TestRunner.java:773) at org.testng.TestRunner.run(TestRunner.java:623) at org.testng.SuiteRunner.runTest(SuiteRunner.java:357) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:352) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:310) at org.testng.SuiteRunner.run(SuiteRunner.java:259) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1185) at org.testng.TestNG.runSuitesLocally(TestNG.java:1110) at org.testng.TestNG.run(TestNG.java:1018) at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:94) at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:54) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:832) test VectorReshapeTests.testReshapeByte(byte(i)): success test VectorReshapeTests.testReshapeDouble(byte(i)): success test VectorReshapeTests.testReshapeFloat(byte(i)): success test VectorReshapeTests.testReshapeInt(byte(i)): success test VectorReshapeTests.testReshapeLong(byte(i)): success test VectorReshapeTests.testReshapeShort(byte(i)): success =============================================== jdk/incubator/vector/VectorReshapeTests.java Total tests run: 17, Failures: 5, Skips: 0 =============================================== > -----Original Message----- > From: Paul Sandoz > Sent: Donnerstag, 22. Oktober 2020 20:11 > To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: Vector API: stack overflow on Big Endian > > Hi Martin, > > Can you provide more details on what test methods are failing? > > I would like to know the scope of the test failures and more precisely where > they occur. > > My suspicion is there by be a problem with the test, specifically with the > method castByteArrayData. > > Paul. > > > On Oct 22, 2020, at 6:23 AM, Doerr, Martin > wrote: > > > > Hi Paul, > > > > thanks a lot for your help. Your fix is working fine. > > > > With that, I can only see one test failing: > > VectorReshapeTests > > > > Failing with Species[byte, 16, S_128_BIT]->Species[short, 8, S_128_BIT] > (lanewise), partLimit=2, block=8, part=0, origin=0 > > Failing with Species[byte, 32, S_256_BIT]->Species[short, 16, S_256_BIT] > (lanewise), partLimit=2, block=16, part=0, origin=0 > > Failing with Species[byte, 64, S_512_BIT]->Species[short, 32, S_512_BIT] > (lanewise), partLimit=2, block=32, part=0, origin=0 > > Failing with Species[byte, 8, S_64_BIT]->Species[short, 4, S_64_BIT] > (lanewise), partLimit=2, block=4, part=0, origin=0 > > Failing with Species[byte, 8, S_Max_BIT]->Species[short, 4, S_Max_BIT] > (lanewise), partLimit=2, block=4, part=0, origin=0 > > > > All of them fail because pairs are swapped like this: > > expect: [1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0] > > output: [0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8] > > > > Do you know if that's another endianness problem? > > > > Best regards, > > Martin > > > > > >> -----Original Message----- > >> From: Paul Sandoz > >> Sent: Donnerstag, 22. Oktober 2020 02:27 > >> To: Doerr, Martin > >> Cc: hotspot-compiler-dev at openjdk.java.net > >> Subject: Re: Vector API: stack overflow on Big Endian > >> > >> Hi Martin, > >> > >> We definitely have far less exposure by default to BE platforms now that > >> SPARC is not a thing, it's easy to unintentionally hardcode a bias to LE. > >> > >> Would it be possible to try running the tests on your BE platforms with the > >> following modification? > >> > >> --- > >> > a/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVect > >> or.java > >> +++ > >> > b/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVect > >> or.java > >> @@ -500,7 +500,7 @@ abstract class AbstractVector extends > Vector { > >> final > >> AbstractVector defaultReinterpret(AbstractSpecies rsp) { > >> int blen = Math.max(this.bitSize(), rsp.vectorBitSize()) / Byte.SIZE; > >> - ByteOrder bo = ByteOrder.LITTLE_ENDIAN; > >> + ByteOrder bo = ByteOrder.nativeOrder();//LITTLE_ENDIAN; > >> ByteBuffer bb = ByteBuffer.allocate(blen); > >> this.intoByteBuffer(bb, 0, bo); > >> VectorMask m = rsp.maskAll(true); > >> > >> Paul. > >> > >> > >>> On Oct 21, 2020, at 4:10 AM, Doerr, Martin > >> wrote: > >>> > >>> Hi, > >>> > >>> I noticed stack overflow issues because of endless recursion on Big > Endian > >> platforms (s390 and PPC64) in several tests: > >>> > >>> E.g. Int64VectorLoadStoreTests: > >>> > >>> at > >> > jdk.incubator.vector/jdk.incubator.vector.IntVector.maybeSwap(IntVector.j > >> ava:3330) > >>> at > >> > jdk.incubator.vector/jdk.incubator.vector.IntVector.intoByteBuffer(IntVecto > >> r.java:3151) > >>> at > >> > jdk.incubator.vector/jdk.incubator.vector.AbstractVector.defaultReinterpret > >> (AbstractVector.java:505) > >>> at > >> > java.base/jdk.internal.vm.vector.VectorSupport.convert(VectorSupport.java > >> :441) > >>> at > >> > jdk.incubator.vector/jdk.incubator.vector.AbstractVector.convert0(Abstract > >> Vector.java:686) > >>> at > >> > jdk.incubator.vector/jdk.incubator.vector.AbstractVector.asVectorRawTemp > >> late(AbstractVector.java:173) > >>> at > >> > jdk.incubator.vector/jdk.incubator.vector.AbstractVector.asByteVectorRawT > >> emplate(AbstractVector.java:179) > >>> at > >> > jdk.incubator.vector/jdk.incubator.vector.Int64Vector.asByteVectorRaw(Int > >> 64Vector.java:177) > >>> at > >> > jdk.incubator.vector/jdk.incubator.vector.Int64Vector.asByteVectorRaw(Int > >> 64Vector.java:41) > >>> at > >> > jdk.incubator.vector/jdk.incubator.vector.IntVector.reinterpretAsBytes(IntV > >> ector.java:3366) > >>> at > >> > jdk.incubator.vector/jdk.incubator.vector.IntVector.maybeSwap(IntVector.j > >> ava:3330) > >>> > >>> Note that maybeSwap is endianness sensitive: > >>> IntVector maybeSwap(ByteOrder bo) { > >>> if (bo != NATIVE_ENDIAN) { > >>> return this.reinterpretAsBytes() > >>> .rearrange(swapBytesShuffle()) > >>> .reinterpretAsInts(); > >>> } > >>> return this; > >>> } > >>> > >>> How is this supposed to work? > >>> Is anything platform specific missing? > >>> > >>> Thanks and best regards, > >>> Martin > >>> > > From jjg at openjdk.java.net Thu Oct 22 19:49:22 2020 From: jjg at openjdk.java.net (Jonathan Gibbons) Date: Thu, 22 Oct 2020 19:49:22 GMT Subject: Integrated: JDK-8255262: Remove use of legacy custom @spec tag In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 17:16:23 GMT, Jonathan Gibbons wrote: > The change is (just) to remove legacy usages of a JDK-private custom tag. This pull request has now been integrated. Changeset: 0aa3c925 Author: Jonathan Gibbons URL: https://git.openjdk.java.net/jdk/commit/0aa3c925 Stats: 209 lines in 69 files changed: 0 ins; 209 del; 0 mod 8255262: Remove use of legacy custom @spec tag Reviewed-by: lancea, mr, iris, alanb, darcy, mchung ------------- PR: https://git.openjdk.java.net/jdk/pull/814 From sviswanathan at openjdk.java.net Thu Oct 22 21:56:12 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 22 Oct 2020 21:56:12 GMT Subject: RFR: 8255210: [Vector API] jdk/incubator/vector/Int256VectorTests.java crashes on AVX512 machines In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 16:48:54 GMT, Paul Sandoz wrote: >> Hi all, >> >> Please review the fix of an AVX512 crash for Vector API. >> The reason is that reductionI in x86.ad didn't use legVec for code generation, which is required by Assembler::vphaddd. >> >> Testing: >> - test/jdk/jdk/incubator/vector all passed on both AVX256 and AVX512 machines >> >> Thanks. >> Best regards, >> Jie > > LGTM based on same changes made previously for short vectors. > Good if Sandhya (@sviswa7) et. al. can review also. The fix looks good to me as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/791 From mdoerr at openjdk.java.net Thu Oct 22 22:07:22 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 22 Oct 2020 22:07:22 GMT Subject: RFR: 8255274: [PPC64, s390] wrong StringLatin1.indexOf version matched Message-ID: <2lYn7oWZsxWQECjxx87FaTqyxpJknxSOeTV6qMqR7cc=.f8d911fd-a5ae-444b-90f8-4331d3b1418c@github.com> PPC64 and s390 currently match indexOfChar_U also for StrIntrinsicNode::L. This leads to incorrect results of StringLatin1.indexOf and alreads breaks builds: Optimizing the exploded image Error occurred during initialization of boot layer We need separate match rules for StrIntrinsicNode::U and StrIntrinsicNode::L. ------------- Commit messages: - 8255274: [PPC64, s390] wrong StringLatin1.indexOf version matched Changes: https://git.openjdk.java.net/jdk/pull/820/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=820&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255274 Stats: 36 lines in 2 files changed: 36 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/820.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/820/head:pull/820 PR: https://git.openjdk.java.net/jdk/pull/820 From clanger at openjdk.java.net Thu Oct 22 22:07:22 2020 From: clanger at openjdk.java.net (Christoph Langer) Date: Thu, 22 Oct 2020 22:07:22 GMT Subject: RFR: 8255274: [PPC64, s390] wrong StringLatin1.indexOf version matched In-Reply-To: <2lYn7oWZsxWQECjxx87FaTqyxpJknxSOeTV6qMqR7cc=.f8d911fd-a5ae-444b-90f8-4331d3b1418c@github.com> References: <2lYn7oWZsxWQECjxx87FaTqyxpJknxSOeTV6qMqR7cc=.f8d911fd-a5ae-444b-90f8-4331d3b1418c@github.com> Message-ID: <89va-BvnrNBPNWGjd0DBhfW1UKwGLlD9KY0ESokUw6w=.82c71d5d-1b61-42af-91ef-27dbc69f6756@github.com> On Thu, 22 Oct 2020 21:56:54 GMT, Martin Doerr wrote: > PPC64 and s390 currently match indexOfChar_U also for StrIntrinsicNode::L. This leads to incorrect results of StringLatin1.indexOf and alreads breaks builds: > Optimizing the exploded image > Error occurred during initialization of boot layer > > We need separate match rules for StrIntrinsicNode::U and StrIntrinsicNode::L. Thanks for fixing this! ------------- Marked as reviewed by clanger (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/820 From paul.sandoz at oracle.com Thu Oct 22 22:17:42 2020 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 22 Oct 2020 15:17:42 -0700 Subject: Vector API: stack overflow on Big Endian In-Reply-To: References: <08399669-DCB9-4CB4-A23F-A6B64439D1DA@oracle.com> Message-ID: <6B33BFE4-351B-4CB4-A72B-94741CF135C6@oracle.com> Thanks, that is helpful, so it's only re-bracket tests for lanewise conversion that fail. This reinforces my suspicions that the actual output is fine, and the test assumes LE encoding in castByteArrayData, which I think takes a short cut for casting just for +ve integral types. Lanewise conversion skips floating point types and so is only partially tested. The method castByteArrayData needs to be reimplemented loading into a ByteBuffer, reading values, casting them, and storing them into a resulting buffer. Not difficult to do, just a little laborious. Paul. > On Oct 22, 2020, at 11:36 AM, Doerr, Martin wrote: > > Hi Paul, > > here's the snippet from the JTwork/jdk/incubator/vector/VectorReshapeTests.jtr from a run on an s390 machine. > Let me know if you need anything else. > > Thanks and best regards, > Martin From njian at openjdk.java.net Fri Oct 23 02:12:12 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 23 Oct 2020 02:12:12 GMT Subject: RFR: 8254670: SVE test uses linux-specific api In-Reply-To: References: Message-ID: On Wed, 21 Oct 2020 08:35:49 GMT, Ningsheng Jian wrote: >> The SVE JNI test uses linux-specific api in native code, which is invalid with Windows/AArch64 and macOS/AArch64 ports in. Fixed by simply reducing the test to Linux only. > > Checked the failed test task log and it seems to fail at aot test. I don't know how my trivial patch could be related. And the newly created PR https://github.com/openjdk/jdk/pull/779 failed as well with the same log... Hi @adinn @VladimirKempik , could you please help to take a look at this trivial fix? ------------- PR: https://git.openjdk.java.net/jdk/pull/778 From jbhateja at openjdk.java.net Fri Oct 23 02:54:12 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 23 Oct 2020 02:54:12 GMT Subject: RFR: 8255210: [Vector API] jdk/incubator/vector/Int256VectorTests.java crashes on AVX512 machines In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 21:54:03 GMT, Sandhya Viswanathan wrote: >> LGTM based on same changes made previously for short vectors. >> Good if Sandhya (@sviswa7) et. al. can review also. > > The fix looks good to me as well. > > Hi @DamonFool , similar problem also exists for reduction patterns for other primitive types (Long/Float/Double). > > Hi @jatin-bhateja , > > Thanks for your reminder. > > I've also noticed these problems. > But I don't have a reproducer right now. > And I need some time to construct one. > > So I plan to file another bug to fix them when the reproducer is ready. > > Thanks. Sure , this fix looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/791 From jiefu at openjdk.java.net Fri Oct 23 03:46:13 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 23 Oct 2020 03:46:13 GMT Subject: RFR: 8255210: [Vector API] jdk/incubator/vector/Int256VectorTests.java crashes on AVX512 machines In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 02:51:46 GMT, Jatin Bhateja wrote: >> The fix looks good to me as well. > >> > Hi @DamonFool , similar problem also exists for reduction patterns for other primitive types (Long/Float/Double). >> >> Hi @jatin-bhateja , >> >> Thanks for your reminder. >> >> I've also noticed these problems. >> But I don't have a reproducer right now. >> And I need some time to construct one. >> >> So I plan to file another bug to fix them when the reproducer is ready. >> >> Thanks. > > Sure , this fix looks good. Thanks @PaulSandoz , @sviswa7 and @jatin-bhateja for your review. Will push it later. ------------- PR: https://git.openjdk.java.net/jdk/pull/791 From jiefu at openjdk.java.net Fri Oct 23 05:56:11 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 23 Oct 2020 05:56:11 GMT Subject: Integrated: 8255210: [Vector API] jdk/incubator/vector/Int256VectorTests.java crashes on AVX512 machines In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 03:12:31 GMT, Jie Fu wrote: > Hi all, > > Please review the fix of an AVX512 crash for Vector API. > The reason is that reductionI in x86.ad didn't use legVec for code generation, which is required by Assembler::vphaddd. > > Testing: > - test/jdk/jdk/incubator/vector all passed on both AVX256 and AVX512 machines > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: a824781b Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/a824781b Stats: 22 lines in 1 file changed: 0 ins; 20 del; 2 mod 8255210: [Vector API] jdk/incubator/vector/Int256VectorTests.java crashes on AVX512 machines Reviewed-by: psandoz, sviswanathan, jbhateja ------------- PR: https://git.openjdk.java.net/jdk/pull/791 From vkempik at openjdk.java.net Fri Oct 23 06:54:51 2020 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Fri, 23 Oct 2020 06:54:51 GMT Subject: RFR: 8254670: SVE test uses linux-specific api In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 02:09:05 GMT, Ningsheng Jian wrote: >> Checked the failed test task log and it seems to fail at aot test. I don't know how my trivial patch could be related. And the newly created PR https://github.com/openjdk/jdk/pull/779 failed as well with the same log... > > Hi @adinn @VladimirKempik , could you please help to take a look at this trivial fix? The changes looks good to me (not a Reviewer tho) Try to rerun the tests, it doesn't look to be related to your changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/778 From shade at openjdk.java.net Fri Oct 23 07:07:00 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 23 Oct 2020 07:07:00 GMT Subject: RFR: 8255265: IdealLoopTree::iteration_split_impl does not use should_align Message-ID: 8255265: IdealLoopTree::iteration_split_impl does not use should_align ------------- Commit messages: - 8255265: IdealLoopTree::iteration_split_impl does not use should_align Changes: https://git.openjdk.java.net/jdk/pull/815/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=815&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255265 Stats: 35 lines in 2 files changed: 0 ins; 25 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/815.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/815/head:pull/815 PR: https://git.openjdk.java.net/jdk/pull/815 From roland at openjdk.java.net Fri Oct 23 07:18:49 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 23 Oct 2020 07:18:49 GMT Subject: RFR: 8255224: x86_32 tests fail with "bad AD file" after JDK-8223051 In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 17:05:52 GMT, Vladimir Kozlov wrote: >> x86_32 is missing an ad file rule for (CMoveL (Bool (CmpUL > > Marked as reviewed by kvn (Reviewer). @vnkozlov @shipilev thanks for the review ------------- PR: https://git.openjdk.java.net/jdk/pull/811 From roland at openjdk.java.net Fri Oct 23 07:18:49 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 23 Oct 2020 07:18:49 GMT Subject: Integrated: 8255224: x86_32 tests fail with "bad AD file" after JDK-8223051 In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 15:05:23 GMT, Roland Westrelin wrote: > x86_32 is missing an ad file rule for (CMoveL (Bool (CmpUL This pull request has now been integrated. Changeset: fe74f3cd Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/fe74f3cd Stats: 22 lines in 1 file changed: 22 ins; 0 del; 0 mod 8255224: x86_32 tests fail with "bad AD file" after JDK-8223051 Reviewed-by: shade, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/811 From vlivanov at openjdk.java.net Fri Oct 23 07:29:37 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 23 Oct 2020 07:29:37 GMT Subject: RFR: 8255210: [Vector API] jdk/incubator/vector/Int256VectorTests.java crashes on AVX512 machines In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 03:12:31 GMT, Jie Fu wrote: > Hi all, > > Please review the fix of an AVX512 crash for Vector API. > The reason is that reductionI in x86.ad didn't use legVec for code generation, which is required by Assembler::vphaddd. > > Testing: > - test/jdk/jdk/incubator/vector all passed on both AVX256 and AVX512 machines > > Thanks. > Best regards, > Jie src/hotspot/cpu/x86/x86.ad line 4429: > 4427: instruct reductionI(rRegI dst, rRegI src1, legVec src2, legVec vtmp1, legVec vtmp2) %{ > 4428: predicate(vector_element_basic_type(n->in(2)) == T_INT && > 4429: vector_length(n->in(2)) <= 16); // src2 FTR since `reduction16I` is gone, vector length check becomes redundant: `reductionI` successfully covers the whole range of vectors of int. ------------- PR: https://git.openjdk.java.net/jdk/pull/791 From jiefu at openjdk.java.net Fri Oct 23 07:38:36 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 23 Oct 2020 07:38:36 GMT Subject: RFR: 8255210: [Vector API] jdk/incubator/vector/Int256VectorTests.java crashes on AVX512 machines In-Reply-To: References: Message-ID: <8DMO3hNqJJ0KYRpd9XrP6tJdxpZ5rEvt0VKNlRfOkVA=.480c427f-1dbb-4b84-b745-0b5607faaead@github.com> On Fri, 23 Oct 2020 07:26:45 GMT, Vladimir Ivanov wrote: >> Hi all, >> >> Please review the fix of an AVX512 crash for Vector API. >> The reason is that reductionI in x86.ad didn't use legVec for code generation, which is required by Assembler::vphaddd. >> >> Testing: >> - test/jdk/jdk/incubator/vector all passed on both AVX256 and AVX512 machines >> >> Thanks. >> Best regards, >> Jie > > src/hotspot/cpu/x86/x86.ad line 4429: > >> 4427: instruct reductionI(rRegI dst, rRegI src1, legVec src2, legVec vtmp1, legVec vtmp2) %{ >> 4428: predicate(vector_element_basic_type(n->in(2)) == T_INT && >> 4429: vector_length(n->in(2)) <= 16); // src2 > > FTR since `reduction16I` is gone, vector length check becomes redundant: `reductionI` successfully covers the whole range of vectors of int. Nice catch! The same case with reductionS. I'll fix it next time. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/791 From redestad at openjdk.java.net Fri Oct 23 08:03:40 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 23 Oct 2020 08:03:40 GMT Subject: RFR: 8255049: Remove support for the hsdis decode_instructions entry point in hotspot In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 17:44:09 GMT, Vladimir Kozlov wrote: >> This patch drops support in hotspot for hsdis plugins which only include the old decode_instructions endpoint. >> >> The decode_instructions entry point in hsdis was replaced by decode_instructions_virtual[1], with support later added to allow old hsdis plugins to work, at least for the duration of JDK 8. Dropping the backwards compatibility means you'll need a hsdis built from JDK 8 sources or later, which seems a reasonable requirement at this point. >> >> It's unclear if a CSR request is needed. >> >> [1] https://github.com/openjdk/jdk/commit/22544e7a7c72e8779355df963e49e846f9458ce4 >> [2] https://github.com/openjdk/jdk/commit/a9c40e9df41ee06adcd7fff951dd36b6c093a24b > > Good. @neliasso @vnkozlov - thanks for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/807 From redestad at openjdk.java.net Fri Oct 23 08:03:42 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 23 Oct 2020 08:03:42 GMT Subject: Integrated: 8255049: Remove support for the hsdis decode_instructions entry point in hotspot In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 12:33:44 GMT, Claes Redestad wrote: > This patch drops support in hotspot for hsdis plugins which only include the old decode_instructions endpoint. > > The decode_instructions entry point in hsdis was replaced by decode_instructions_virtual[1], with support later added to allow old hsdis plugins to work, at least for the duration of JDK 8. Dropping the backwards compatibility means you'll need a hsdis built from JDK 8 sources or later, which seems a reasonable requirement at this point. > > It's unclear if a CSR request is needed. > > [1] https://github.com/openjdk/jdk/commit/22544e7a7c72e8779355df963e49e846f9458ce4 > [2] https://github.com/openjdk/jdk/commit/a9c40e9df41ee06adcd7fff951dd36b6c093a24b This pull request has now been integrated. Changeset: 107fb9cc Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/107fb9cc Stats: 37 lines in 2 files changed: 0 ins; 31 del; 6 mod 8255049: Remove support for the hsdis decode_instructions entry point in hotspot Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/807 From shade at openjdk.java.net Fri Oct 23 08:33:42 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 23 Oct 2020 08:33:42 GMT Subject: RFR: 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle Message-ID: There is a TODO item in their `get_vm_target`-s. We can clean those up. It looks to me the caller code does not handle `NULL` result well, which means we better `fatal` ourselves before exposing `NULL` to callers and `SEGV`-ing there. ------------- Commit messages: - Rename methods a bit - 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle Changes: https://git.openjdk.java.net/jdk/pull/825/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=825&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255301 Stats: 21 lines in 3 files changed: 8 ins; 6 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/825.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/825/head:pull/825 PR: https://git.openjdk.java.net/jdk/pull/825 From vlivanov at openjdk.java.net Fri Oct 23 08:37:41 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 23 Oct 2020 08:37:41 GMT Subject: Integrated: 8255067: Restore Copyright line in file modified by 8253191 In-Reply-To: <1TqKsX8MsupinIh8g6bkCZJXKnR56ziAMJVmNkocOeI=.1b1c94e5-c015-4233-8d20-4ff997c05f89@github.com> References: <1TqKsX8MsupinIh8g6bkCZJXKnR56ziAMJVmNkocOeI=.1b1c94e5-c015-4233-8d20-4ff997c05f89@github.com> Message-ID: On Tue, 20 Oct 2020 19:10:17 GMT, Vladimir Ivanov wrote: > Make it crystal clear that 8253191 test case doesn't have anything in common with the original version of the test case for 8204479: > * restore original test case > * add 8253191 test case as a separate file This pull request has now been integrated. Changeset: e52156d7 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/e52156d7 Stats: 71 lines in 2 files changed: 24 ins; 19 del; 28 mod 8255067: Restore Copyright line in file modified by 8253191 Reviewed-by: kvn, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/771 From vlivanov at openjdk.java.net Fri Oct 23 08:37:50 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 23 Oct 2020 08:37:50 GMT Subject: RFR: 8253734: C2: Optimize Move nodes Message-ID: Introduce the following transformations for Move nodes: 1. `MoveI2F (MoveF2I x) => x` 1. `MoveI2F (LoadI mem) => LoadF mem` 1. `StoreI mem (MoveF2I x) => StoreF mem x` (The same applies to MoveL2D/MoveD2L.) ?1 eliminates redundant operations and ?2/?3 avoid reg-to-reg moves in generated code: 0x000000010d09964c: vmovss 0x20(%rsi),%xmm1 0x000000010d099651: vmovd %xmm1,%eax ;*invokestatic floatToRawIntBits vs 0x0000000110c5a6cc: mov 0x20(%rsi),%eax ;*invokestatic floatToRawIntBits (?2 and ?3 are performed late (after loop opts are over) to avoid high-level optimizations passes to handle newly introduced mismatched accesses.) Testing: tier1-5. ------------- Commit messages: - 8253734: Optimize Move nodes Changes: https://git.openjdk.java.net/jdk/pull/826/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=826&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253734 Stats: 142 lines in 5 files changed: 127 ins; 0 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/826.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/826/head:pull/826 PR: https://git.openjdk.java.net/jdk/pull/826 From vlivanov at openjdk.java.net Fri Oct 23 08:39:39 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 23 Oct 2020 08:39:39 GMT Subject: RFR: 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 08:27:36 GMT, Aleksey Shipilev wrote: > There is a TODO item in their `get_vm_target`-s. We can clean those up. It looks to me the caller code does not handle `NULL` result well, which means we better `fatal` ourselves before exposing `NULL` to callers and `SEGV`-ing there. src/hotspot/share/ci/ciMemberName.cpp line 46: > 44: return CURRENT_ENV->get_method((Method*) vmtarget); > 45: } else { > 46: fatal("vmtarget should be a method"); There's a slight difference in behavior here: `fatal()` is present in product binaries while `assert()` is not. Is the change deliberate? ------------- PR: https://git.openjdk.java.net/jdk/pull/825 From shade at openjdk.java.net Fri Oct 23 08:43:38 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 23 Oct 2020 08:43:38 GMT Subject: RFR: 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle In-Reply-To: References: Message-ID: <-hky3DTt0KMtCAnGxV-2eTTbtqexdW-pu69nBD3BO-k=.dfc33cbc-38f2-49ac-92c9-6c481ca56638@github.com> On Fri, 23 Oct 2020 08:36:52 GMT, Vladimir Ivanov wrote: >> There is a TODO item in their `get_vm_target`-s. We can clean those up. It looks to me the caller code does not handle `NULL` result well, which means we better `fatal` ourselves before exposing `NULL` to callers and `SEGV`-ing there. > > src/hotspot/share/ci/ciMemberName.cpp line 46: > >> 44: return CURRENT_ENV->get_method((Method*) vmtarget); >> 45: } else { >> 46: fatal("vmtarget should be a method"); > > There's a slight difference in behavior here: `fatal()` is present in product binaries while `assert()` is not. > Is the change deliberate? Yes. Because AFAICS, the `get_vmtarget` callers use the result without any null checks, calling member methods off that `ciMethod*`, and thus release bits would (hopefully) `SEGV` if this path is taken. In this case, it seems prudent to `fatal()` at sensible point before that happens. ------------- PR: https://git.openjdk.java.net/jdk/pull/825 From adinn at openjdk.java.net Fri Oct 23 08:47:35 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Fri, 23 Oct 2020 08:47:35 GMT Subject: RFR: 8254670: SVE test uses linux-specific api In-Reply-To: References: Message-ID: On Wed, 21 Oct 2020 05:32:20 GMT, Ningsheng Jian wrote: > The SVE JNI test uses linux-specific api in native code, which is invalid with Windows/AArch64 and macOS/AArch64 ports in. Fixed by simply reducing the test to Linux only. Yes, this is good and trivial. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/778 From vlivanov at openjdk.java.net Fri Oct 23 08:55:37 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 23 Oct 2020 08:55:37 GMT Subject: RFR: 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle In-Reply-To: <-hky3DTt0KMtCAnGxV-2eTTbtqexdW-pu69nBD3BO-k=.dfc33cbc-38f2-49ac-92c9-6c481ca56638@github.com> References: <-hky3DTt0KMtCAnGxV-2eTTbtqexdW-pu69nBD3BO-k=.dfc33cbc-38f2-49ac-92c9-6c481ca56638@github.com> Message-ID: On Fri, 23 Oct 2020 08:40:55 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/ci/ciMemberName.cpp line 46: >> >>> 44: return CURRENT_ENV->get_method((Method*) vmtarget); >>> 45: } else { >>> 46: fatal("vmtarget should be a method"); >> >> There's a slight difference in behavior here: `fatal()` is present in product binaries while `assert()` is not. >> Is the change deliberate? > > Yes. Because AFAICS, the `get_vmtarget` callers use the result without any null checks, calling member methods off that `ciMethod*`, and thus release bits would (hopefully) `SEGV` if this path is taken. In this case, it seems prudent to `fatal()` at sensible point before that happens. Ok. I took a closer look and noticed that `java_lang_invoke_MemberName::vmtarget` already returns `Method*`. So, the code can be rewritten as follows: Method* vmtarget = java_lang_invoke_MemberName::vmtarget(get_oop()); return CURRENT_ENV->get_method(vmtarget); ------------- PR: https://git.openjdk.java.net/jdk/pull/825 From njian at openjdk.java.net Fri Oct 23 08:58:36 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 23 Oct 2020 08:58:36 GMT Subject: RFR: 8254670: SVE test uses linux-specific api In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 08:44:50 GMT, Andrew Dinn wrote: >> The SVE JNI test uses linux-specific api in native code, which is invalid with Windows/AArch64 and macOS/AArch64 ports in. Fixed by simply reducing the test to Linux only. > > Yes, this is good and trivial. Thanks for the review! @adinn @VladimirKempik ------------- PR: https://git.openjdk.java.net/jdk/pull/778 From shade at openjdk.java.net Fri Oct 23 09:03:53 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 23 Oct 2020 09:03:53 GMT Subject: RFR: 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle [v2] In-Reply-To: References: Message-ID: > There is a TODO item in their `get_vm_target`-s. We can clean those up. It looks to me the caller code does not handle `NULL` result well, which means we better `fatal` ourselves before exposing `NULL` to callers and `SEGV`-ing there. Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: - Remove whitespace - Simplify the whole thing - Simplify, because MemberName::vmtarget already returns Method* ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/825/files - new: https://git.openjdk.java.net/jdk/pull/825/files/238bf02a..ec242d22 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=825&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=825&range=00-01 Stats: 17 lines in 3 files changed: 1 ins; 13 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/825.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/825/head:pull/825 PR: https://git.openjdk.java.net/jdk/pull/825 From shade at openjdk.java.net Fri Oct 23 09:03:53 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 23 Oct 2020 09:03:53 GMT Subject: RFR: 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle [v2] In-Reply-To: References: <-hky3DTt0KMtCAnGxV-2eTTbtqexdW-pu69nBD3BO-k=.dfc33cbc-38f2-49ac-92c9-6c481ca56638@github.com> Message-ID: <4MkaUVWLUymBWnWHI7rRCzDWy5NeU9PfLYc89SiBcB4=.70bf5056-55fb-403e-bea4-d6c42e168f61@github.com> On Fri, 23 Oct 2020 08:52:52 GMT, Vladimir Ivanov wrote: >> Yes. Because AFAICS, the `get_vmtarget` callers use the result without any null checks, calling member methods off that `ciMethod*`, and thus release bits would (hopefully) `SEGV` if this path is taken. In this case, it seems prudent to `fatal()` at sensible point before that happens. > > Ok. I took a closer look and noticed that `java_lang_invoke_MemberName::vmtarget` already returns `Method*`. So, the code can be rewritten as follows: > Method* vmtarget = java_lang_invoke_MemberName::vmtarget(get_oop()); > return CURRENT_ENV->get_method(vmtarget); Oh, good stuff. I missed that! This means we can simplify the whole patch, and avoid the new method and the dependency between `ciMethodHandle` and `ciMemberName`. Please see new patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/825 From thartmann at openjdk.java.net Fri Oct 23 09:04:41 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 23 Oct 2020 09:04:41 GMT Subject: RFR: 8255265: IdealLoopTree::iteration_split_impl does not use should_align In-Reply-To: References: Message-ID: <9vsXv5fbbOEiU6qitiMzhDs1y1N2rCwZkVnepvRU4Wg=.2ca37303-9f6f-4e9d-87d6-2f058daf1009@github.com> On Thu, 22 Oct 2020 18:00:28 GMT, Aleksey Shipilev wrote: > 8255265: IdealLoopTree::iteration_split_impl does not use should_align Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/815 From vlivanov at openjdk.java.net Fri Oct 23 09:08:37 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 23 Oct 2020 09:08:37 GMT Subject: RFR: 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle [v2] In-Reply-To: References: Message-ID: <7N_vNV9wARHRRRJLsbcOm7hjhn3RDtGSGl9ooiH-CAo=.03921e87-80ed-4299-8c49-9e2027db5f49@github.com> On Fri, 23 Oct 2020 09:03:53 GMT, Aleksey Shipilev wrote: >> There is a TODO item in their `get_vm_target`-s. We can clean those up. It looks to me the caller code does not handle `NULL` result well, which means we better `fatal` ourselves before exposing `NULL` to callers and `SEGV`-ing there. > > Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: > > - Remove whitespace > - Simplify the whole thing > - Simplify, because MemberName::vmtarget already returns Method* Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/825 From rrich at openjdk.java.net Fri Oct 23 09:10:36 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Fri, 23 Oct 2020 09:10:36 GMT Subject: RFR: 8255274: [PPC64, s390] wrong StringLatin1.indexOf version matched In-Reply-To: <89va-BvnrNBPNWGjd0DBhfW1UKwGLlD9KY0ESokUw6w=.82c71d5d-1b61-42af-91ef-27dbc69f6756@github.com> References: <2lYn7oWZsxWQECjxx87FaTqyxpJknxSOeTV6qMqR7cc=.f8d911fd-a5ae-444b-90f8-4331d3b1418c@github.com> <89va-BvnrNBPNWGjd0DBhfW1UKwGLlD9KY0ESokUw6w=.82c71d5d-1b61-42af-91ef-27dbc69f6756@github.com> Message-ID: On Thu, 22 Oct 2020 22:03:36 GMT, Christoph Langer wrote: >> PPC64 and s390 currently match indexOfChar_U also for StrIntrinsicNode::L. This leads to incorrect results of StringLatin1.indexOf and alreads breaks builds: >> Optimizing the exploded image >> Error occurred during initialization of boot layer >> >> We need separate match rules for StrIntrinsicNode::U and StrIntrinsicNode::L. > > Thanks for fixing this! The fix looks correct. But you are adding another 20 lines to an already humonguous file by duplicating a complex instruction form and the diff between the 2 variants is very small. This makes the file hardly readable... at least for humans. Wouldn't it be possible to have just one instruction form that matches both variants and feed the predicate expression to the `is_byte` parameter of `string_indexof_char`? ------------- PR: https://git.openjdk.java.net/jdk/pull/820 From thartmann at openjdk.java.net Fri Oct 23 09:22:38 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 23 Oct 2020 09:22:38 GMT Subject: RFR: 8253734: C2: Optimize Move nodes In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 08:27:50 GMT, Vladimir Ivanov wrote: > Introduce the following transformations for Move nodes: > 1. `MoveI2F (MoveF2I x) => x` > > 1. `MoveI2F (LoadI mem) => LoadF mem` > > 1. `StoreI mem (MoveF2I x) => StoreF mem x` > > (The same applies to MoveL2D/MoveD2L.) > > ?1 eliminates redundant operations and ?2/?3 avoid reg-to-reg moves in generated code: > 0x000000010d09964c: vmovss 0x20(%rsi),%xmm1 > 0x000000010d099651: vmovd %xmm1,%eax ;*invokestatic floatToRawIntBits > vs > 0x0000000110c5a6cc: mov 0x20(%rsi),%eax ;*invokestatic floatToRawIntBits > > > (?2 and ?3 are performed late (after loop opts are over) to avoid high-level optimizations passes to handle newly introduced mismatched accesses.) > > Testing: tier1-5. Maybe add some comments with examples to the reinterpret methods. src/hotspot/share/opto/movenode.hpp line 126: > 124: public: > 125: MoveL2DNode(Node* value) : MoveNode(value) { > 126: init_class_id(Class_Move); Shouldn't `init_class_id` be moved to the `MoveNode` constructor? src/hotspot/share/opto/movenode.hpp line 108: > 106: virtual Node* Ideal(PhaseGVN* phase, bool can_reshape); > 107: virtual Node* Identity(PhaseGVN* phase); > 108: // virtual const Type* Value(PhaseGVN* phase) const; Can be removed. ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/826 From rrich at openjdk.java.net Fri Oct 23 09:33:36 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Fri, 23 Oct 2020 09:33:36 GMT Subject: RFR: 8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown [v3] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 08:15:34 GMT, Chris Plummer wrote: >>> >>> >>> Looks good. >> >> Thank you. I'll wait for a second review assuming it's required. > >> Thank you. I'll wait for a second review assuming it's required. > > You might want to add the compiler and/or gc teams to the review Following @plummercj advice to add compiler/gc teams. ------------- PR: https://git.openjdk.java.net/jdk/pull/775 From mdoerr at openjdk.java.net Fri Oct 23 09:59:41 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 23 Oct 2020 09:59:41 GMT Subject: RFR: 8255274: [PPC64, s390] wrong StringLatin1.indexOf version matched In-Reply-To: References: <2lYn7oWZsxWQECjxx87FaTqyxpJknxSOeTV6qMqR7cc=.f8d911fd-a5ae-444b-90f8-4331d3b1418c@github.com> <89va-BvnrNBPNWGjd0DBhfW1UKwGLlD9KY0ESokUw6w=.82c71d5d-1b61-42af-91ef-27dbc69f6756@github.com> Message-ID: <2g2pBVUXX_0MEixSAsRPktBkfqgNGpzKg1Ci996ZfrY=.dd2c7325-4e86-4506-9a5d-1b848eb1a082@github.com> On Fri, 23 Oct 2020 09:07:41 GMT, Richard Reingruber wrote: >> Thanks for fixing this! > > The fix looks correct. > But you are adding another 20 lines to an already humonguous file by duplicating a complex instruction form and the diff between the 2 variants is very small. This makes the file hardly readable... at least for humans. > Wouldn't it be possible to have just one instruction form that matches both variants and feed the predicate expression to the `is_byte` parameter of `string_indexof_char`? Thanks for the reviews. @reinrich That would be nice, but I'm not aware of an easy way to do this. I can't use "((StrIndexOfCharNode*)n)->encoding()" in the "ins_encode" because the Ideal node is only visible during "match". ------------- PR: https://git.openjdk.java.net/jdk/pull/820 From adinn at openjdk.java.net Fri Oct 23 10:08:37 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Fri, 23 Oct 2020 10:08:37 GMT Subject: RFR: 8255287: aarch64: fix SVE patterns for vector shift count In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 09:35:11 GMT, Fei Yang wrote: > SVE patterns for vector shift count cannot be matched due to bad matching rules. > Also code gen is not correct in certain cases for vlslS_imm and vlsrS_imm. > Please refer to JDK-8255287 for details. > Patch passed tier1 tests using QEMU system emulator which supports SVE. These changes look ok. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/827 From njian at openjdk.java.net Fri Oct 23 10:11:39 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 23 Oct 2020 10:11:39 GMT Subject: Integrated: 8254670: SVE test uses linux-specific api In-Reply-To: References: Message-ID: On Wed, 21 Oct 2020 05:32:20 GMT, Ningsheng Jian wrote: > The SVE JNI test uses linux-specific api in native code, which is invalid with Windows/AArch64 and macOS/AArch64 ports in. Fixed by simply reducing the test to Linux only. This pull request has now been integrated. Changeset: ac1748e7 Author: Ningsheng Jian URL: https://git.openjdk.java.net/jdk/commit/ac1748e7 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8254670: SVE test uses linux-specific api Reviewed-by: adinn, vkempik ------------- PR: https://git.openjdk.java.net/jdk/pull/778 From rrich at openjdk.java.net Fri Oct 23 10:29:39 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Fri, 23 Oct 2020 10:29:39 GMT Subject: RFR: 8255274: [PPC64, s390] wrong StringLatin1.indexOf version matched In-Reply-To: <2lYn7oWZsxWQECjxx87FaTqyxpJknxSOeTV6qMqR7cc=.f8d911fd-a5ae-444b-90f8-4331d3b1418c@github.com> References: <2lYn7oWZsxWQECjxx87FaTqyxpJknxSOeTV6qMqR7cc=.f8d911fd-a5ae-444b-90f8-4331d3b1418c@github.com> Message-ID: <-xLmclNToXEZ6K3CRLiTdfi9pRdTmNsChuRB9-ryMvM=.e3a6ce78-c3e2-4659-aa52-422d37a1e0ef@github.com> On Thu, 22 Oct 2020 21:56:54 GMT, Martin Doerr wrote: > PPC64 and s390 currently match indexOfChar_U also for StrIntrinsicNode::L. This leads to incorrect results of StringLatin1.indexOf and alreads breaks builds: > Optimizing the exploded image > Error occurred during initialization of boot layer > > We need separate match rules for StrIntrinsicNode::U and StrIntrinsicNode::L. Ok, I see. Would be a worthwhile enhancement to preserve information that allows for having just one instruction form. You might want to change to format string to indicate which version is used. I leave it to your discretion. Thanks again for fixing. ------------- Marked as reviewed by rrich (Committer). PR: https://git.openjdk.java.net/jdk/pull/820 From xxinliu at amazon.com Fri Oct 23 10:42:39 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Fri, 23 Oct 2020 10:42:39 +0000 Subject: Bad graph detected in build_loop_late but I have no clue Message-ID: Hi, hotspot developers, I am debugging a structural problem caused by https://github.com/openjdk/jdk/pull/704 So far, I know that 2 cases both crash at PhaseIdealLoop::build_loop_late_post() where MaxLoopUnrolling happens for EA. That algorithm looks very similar to 2.3 Schedule Late and 2.4 Selecting a Block[1]. Isn?t that paper describe global code motion? Is build_loop_early/late() actually doing code motion before real loop optimizations? In particular, I don?t understand this statement. set_ctrl(n, least); Is this set_ctrl(Y, X) saying Y control dependent on X. The definition of control dependence is in [2]? So far, I found that PhaseIdealLoop::_nodes[IDX] can be any of 3 different values. 1. NULL, which means IDX is dead. 2. A CFG node with the lowest bit set. Assigned by set_ctrl. 3. IdealLoopTree*, when this node is the head of a loop. Do I understand this data-structure right? So PhaseIdealLoop doesn?t have ?BasicBlocks? and it uses _nodes to mark where a node belongs to? I understand that (legal->is_Start() && !early->is_Root()) is a legit assertion, I believe I mess up the ideal graph somewhere and cause this fiasco. Could you give me a pointer which node is broken? Or, could you share me with some hints how to debug this kind of problem? Thank you in advance! --lx [1] Click, Cliff. "Global code motion/global value numbering." Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation. 1995. [2] Ferrante, Jeanne, Karl J. Ottenstein, and Joe D. Warren. "The program dependence graph and its use in optimization." ACM Transactions on Programming Languages and Systems (TOPLAS) 9.3 (1987): 319-349. Bad graph detected in build_loop_late n: 797 Bool === _ 1080 [[ 520 521 ]] [ne] !jvms: Handler::checkNestedProtocol @ bci:7 Handler::parseURL @ bci:94 early(n): 535 CallStaticJava === 519 174 240 53 1 ( 249 214 247 1 1 1 1 248 1 1 1 1 1 1 1 1 249 484 1 1 ) [[ 898 395 715 265 ]] # Static java.net.URL::getFile java/lang/String:exact * ( java/net/URL:NotNull:exact * ) Handler::parseContextSpec @ bci:1 Handler::parseURL @ bci:137 !jvms: String::regionMatches @ bci:30 Handler::parseURL @ bci:75 n->in(1): 1080 CmpP === _ 265 208 [[ 797 ]] !jvms: String::lastIndexOf @ bci:8 Handler::parseContextSpec @ bci:72 Handler::parseURL @ bci:137 early(n->in(1)): 535 CallStaticJava === 519 174 240 53 1 ( 249 214 247 1 1 1 1 248 1 1 1 1 1 1 1 1 249 484 1 1 ) [[ 898 395 715 265 ]] # Static java.net.URL::getFile java/lang/String:exact * ( java/net/URL:NotNull:exact * ) Handler::parseContextSpec @ bci:1 Handler::parseURL @ bci:137 !jvms: String::regionMatches @ bci:30 Handler::parseURL @ bci:75 n->in(1)->in(1): 265 Proj === 535 [[ 960 150 758 1080 516 516 128 138 202 203 130 141 139 149 ]] #5 Oop:java/lang/String:exact * !jvms: Handler::parseURL @ bci:41 early(n->in(1)->in(1)): 535 CallStaticJava === 519 174 240 53 1 ( 249 214 247 1 1 1 1 248 1 1 1 1 1 1 1 1 249 484 1 1 ) [[ 898 395 715 265 ]] # Static java.net.URL::getFile java/lang/String:exact * ( java/net/URL:NotNull:exact * ) Handler::parseContextSpec @ bci:1 Handler::parseURL @ bci:137 !jvms: String::regionMatches @ bci:30 Handler::parseURL @ bci:75 n->in(1)->in(2): 208 ConP === 0 [[ 154 154 831 1657 1052 109 1053 111 112 112 384 384 709 709 1015 1015 2138 2239 518 1014 1014 113 1063 114 385 1013 1065 115 342 1066 116 1069 117 1013 1070 118 703 678 1077 120 1080 122 1012 123 1085 124 1087 126 1953 201 2153 1492 616 129 703 1826 1102 134 1012 1148 152 151 1153 1656 1114 140 2165 1968 704 703 1121 142 704 2176 1982 1126 143 1131 144 1132 145 704 148 127 ]] #NULL !jvms: StringLatin1::indexOf @ bci:32 String::indexOf @ bci:13 Handler::parseURL @ bci:11 early(n->in(1)->in(2)): 0 Root === 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 [[ 0 1 110 208 209 207 1009 235 236 810 1092 339 772 283 213 1699 861 238 237 1664 2017 1726 938 242 241 268 943 255 2147 1424 971 2156 280 285 275 263 1889 2250 297 1696 1697 340 944 1248 2093 2246 2229 1706 479 1021 1556 1798 1383 ]] LCA(n): 1291 Region === 1291 1456 1457 1458 1459 [[ 1291 1583 1079 ]] !jvms: Handler::parseContextSpec @ bci:131 Handler::parseURL @ bci:137 n->out(0): 520 If === 796 797 [[ 959 250 ]] P=0.999999, C=-1.000000 !jvms: String::coder @ bci:-1 String::length @ bci:6 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 n->out(0)->out(0): 959 IfTrue === 520 [[ 684 960 ]] #1 !jvms: Handler::parseContextSpec @ bci:23 Handler::parseURL @ bci:137 n->out(0)->out(1): 250 IfFalse === 520 [[ 122 ]] #0 !jvms: Handler::parseURL @ bci:32 n->out(1): 521 If === 798 797 [[ 801 252 ]] P=0.999999, C=-1.000000 !jvms: String::length @ bci:6 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 n->out(1)->out(0): 801 IfTrue === 521 [[ 525 758 ]] #1 !jvms: Handler::checkNestedProtocol @ bci:7 Handler::parseURL @ bci:94 n->out(1)->out(1): 252 IfFalse === 521 [[ 123 ]] #0 !jvms: Handler::parseURL @ bci:32 idoms of early 535: idom[0] 535 CallStaticJava === 519 174 240 53 1 ( 249 214 247 1 1 1 1 248 1 1 1 1 1 1 1 1 249 484 1 1 ) [[ 898 395 715 265 ]] # Static java.net.URL::getFile java/lang/String:exact * ( java/net/URL:NotNull:exact * ) Handler::parseContextSpec @ bci:1 Handler::parseURL @ bci:137 !jvms: String::regionMatches @ bci:30 Handler::parseURL @ bci:75 idom[1] 519 IfTrue === 514 [[ 535 249 ]] #1 !jvms: String::length @ bci:6 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[2] 514 If === 791 792 [[ 519 243 ]] P=0.999999, C=-1.000000 !jvms: String::length @ bci:6 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[3] 791 IfFalse === 1076 [[ 514 ]] #0 !jvms: Handler::checkNestedProtocol @ bci:7 Handler::parseURL @ bci:94 idom[4] 1076 If === 1289 1290 [[ 880 791 ]] P=0.000000, C=127.000000 !jvms: String::lastIndexOf @ bci:8 Handler::parseContextSpec @ bci:72 Handler::parseURL @ bci:137 idom[5] 1289 IfFalse === 957 [[ 1076 ]] #0 !jvms: Handler::parseContextSpec @ bci:131 Handler::parseURL @ bci:137 idom[6] 957 If === 1187 1188 [[ 681 1289 ]] P=0.000000, C=127.000000 !jvms: Handler::parseContextSpec @ bci:23 Handler::parseURL @ bci:137 idom[7] 1187 IfFalse === 513 [[ 957 ]] #0 !orig=[5623] !jvms: Handler::parseURL @ bci:137 idom[8] 513 If === 789 790 [[ 239 1187 ]] P=0.900000, C=-1.000000 !jvms: String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[9] 789 CatchProj === 352 [[ 513 ]] #0 at bci -1 !jvms: Handler::checkNestedProtocol @ bci:7 Handler::parseURL @ bci:94 idom[10] 352 Catch === 627 174 [[ 789 160 ]] !jvms: Handler::parseURL @ bci:51 idom[11] 627 Proj === 393 [[ 352 ]] #0 !jvms: String::regionMatches @ bci:-1 Handler::parseURL @ bci:75 idom[12] 393 CallStaticJava === 676 233 234 53 1 ( 484 235 236 237 236 238 214 247 677 1 1 678 248 1 679 680 1 1 1 1 1 1 ) [[ 627 174 240 1288 ]] # Static java.lang.String::regionMatches bool ( java/lang/String:NotNull:exact *, int, int, java/lang/String:exact *, int, int ) Handler::checkNestedProtocol @ bci:7 Handler::parseURL @ bci:94 !jvms: Handler::parseURL @ bci:61 idom[13] 676 IfTrue === 508 [[ 393 484 ]] #1 !jvms: String::regionMatches @ bci:98 Handler::parseURL @ bci:75 idom[14] 508 If === 509 784 [[ 676 232 ]] P=0.999999, C=-1.000000 !jvms: String::length @ bci:4 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[15] 509 Region === 509 785 786 [[ 509 677 233 234 508 ]] !jvms: String::length @ bci:4 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[16] 1451 If === 1578 1579 [[ 1358 1283 ]] P=0.900000, C=-1.000000 !jvms: String::length @ bci:4 Handler::indexOfBangSlash @ bci:1 Handler::parseURL @ bci:144 idom[17] 1578 CatchProj === 623 [[ 1451 ]] #0 at bci -1 !jvms: String::lastIndexOf @ bci:28 Handler::indexOfBangSlash @ bci:9 Handler::parseURL @ bci:144 idom[18] 623 Catch === 894 389 [[ 1578 348 ]] !jvms: String::coder @ bci:6 String::regionMatches @ bci:70 Handler::parseURL @ bci:75 idom[19] 894 Proj === 672 [[ 623 ]] #0 !jvms: Handler::parseContextSpec @ bci:1 Handler::parseURL @ bci:137 idom[20] 672 CallStaticJava === 940 253 254 53 1 ( 217 210 941 214 247 1 1 1 678 248 1 679 680 1 1 1 1 264 217 210 941 1 ) [[ 894 389 511 ]] # Static java.lang.String::checkBoundsBeginEnd void ( int, int, int ) String::substring @ bci:8 Handler::parseURL @ bci:88 !jvms: String::regionMatches @ bci:98 Handler::parseURL @ bci:75 idom[21] 940 IfTrue === 531 [[ 672 1507 ]] #1 !jvms: Handler::parseContextSpec @ bci:15 Handler::parseURL @ bci:137 idom[22] 531 If === 522 559 [[ 940 261 ]] P=0.999999, C=-1.000000 !jvms: String::coder @ bci:14 String::length @ bci:6 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[23] 522 Region === 522 799 800 [[ 522 531 253 254 680 760 ]] !jvms: String::coder @ bci:3 String::length @ bci:6 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[24] 1081 If === 1293 1294 [[ 799 1462 ]] P=0.000000, C=127.000000 !jvms: String::lastIndexOf @ bci:8 Handler::parseContextSpec @ bci:72 Handler::parseURL @ bci:137 idom[25] 1293 IfTrue === 507 [[ 1081 1784 ]] #1 !jvms: Handler::parseContextSpec @ bci:131 Handler::parseURL @ bci:137 idom[26] 507 If === 533 783 [[ 1293 231 ]] P=0.999999, C=-1.000000 !jvms: String::length @ bci:4 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[27] 533 IfTrue === 503 [[ 507 264 ]] #1 !jvms: String::length @ bci:-1 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[28] 503 If === 504 780 [[ 533 228 ]] P=0.999999, C=-1.000000 !jvms: String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[29] 504 Region === 504 781 782 [[ 504 503 229 230 534 248 679 247 678 ]] !jvms: String::length @ bci:1 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[30] 1060 If === 935 1277 [[ 1067 773 ]] P=1.000000, C=127.000000 !jvms: String::length @ bci:6 String::lastIndexOf @ bci:3 Handler::parseContextSpec @ bci:72 Handler::parseURL @ bci:137 idom[31] 935 Region === 935 1170 1171 [[ 935 1060 667 ]] !jvms: Handler::parseContextSpec @ bci:11 Handler::parseURL @ bci:137 idom[32] 1350 If === 500 1282 [[ 1170 1171 ]] P=1.000000, C=127.000000 !jvms: Handler::parseContextSpec @ bci:143 Handler::parseURL @ bci:137 idom[33] 500 Region === 500 774 775 [[ 500 1350 222 223 932 ]] !jvms: String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[34] 931 If === 493 1165 [[ 662 765 ]] P=0.000000, C=147.000000 !jvms: Handler::parseContextSpec @ bci:11 Handler::parseURL @ bci:137 idom[35] 493 IfTrue === 489 [[ 931 216 ]] #1 idom[36] 489 If === 763 764 [[ 493 204 ]] P=0.999999, C=-1.000000 !jvms: String::regionMatches @ bci:21 Handler::parseURL @ bci:75 idom[37] 763 Parm === 110 [[ 489 ]] Control !jvms: Handler::parseURL @ bci:88 idoms of (wrong) LCA 1291: idom[0] 1291 Region === 1291 1456 1457 1458 1459 [[ 1291 1583 1079 ]] !jvms: Handler::parseContextSpec @ bci:131 Handler::parseURL @ bci:137 idom[1] 1490 If === 1604 1605 [[ 1457 1333 ]] P=0.500000, C=-1.000000 !jvms: String::lastIndexOf @ bci:1 Handler::indexOfBangSlash @ bci:9 Handler::parseURL @ bci:144 idom[2] 1604 IfTrue === 760 [[ 1490 1991 ]] #1 !jvms: Handler::indexOfBangSlash @ bci:15 Handler::parseURL @ bci:144 idom[3] 760 If === 522 559 [[ 1604 486 ]] P=0.999999, C=-1.000000 !jvms: Handler::parseURL @ bci:88 idom[4] 522 Region === 522 799 800 [[ 522 531 253 254 680 760 ]] !jvms: String::coder @ bci:3 String::length @ bci:6 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[5] 1081 If === 1293 1294 [[ 799 1462 ]] P=0.000000, C=127.000000 !jvms: String::lastIndexOf @ bci:8 Handler::parseContextSpec @ bci:72 Handler::parseURL @ bci:137 idom[6] 1293 IfTrue === 507 [[ 1081 1784 ]] #1 !jvms: Handler::parseContextSpec @ bci:131 Handler::parseURL @ bci:137 idom[7] 507 If === 533 783 [[ 1293 231 ]] P=0.999999, C=-1.000000 !jvms: String::length @ bci:4 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[8] 533 IfTrue === 503 [[ 507 264 ]] #1 !jvms: String::length @ bci:-1 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[9] 503 If === 504 780 [[ 533 228 ]] P=0.999999, C=-1.000000 !jvms: String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[10] 504 Region === 504 781 782 [[ 504 503 229 230 534 248 679 247 678 ]] !jvms: String::length @ bci:1 String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[11] 1060 If === 935 1277 [[ 1067 773 ]] P=1.000000, C=127.000000 !jvms: String::length @ bci:6 String::lastIndexOf @ bci:3 Handler::parseContextSpec @ bci:72 Handler::parseURL @ bci:137 idom[12] 935 Region === 935 1170 1171 [[ 935 1060 667 ]] !jvms: Handler::parseContextSpec @ bci:11 Handler::parseURL @ bci:137 idom[13] 1350 If === 500 1282 [[ 1170 1171 ]] P=1.000000, C=127.000000 !jvms: Handler::parseContextSpec @ bci:143 Handler::parseURL @ bci:137 idom[14] 500 Region === 500 774 775 [[ 500 1350 222 223 932 ]] !jvms: String::regionMatches @ bci:27 Handler::parseURL @ bci:75 idom[15] 931 If === 493 1165 [[ 662 765 ]] P=0.000000, C=147.000000 !jvms: Handler::parseContextSpec @ bci:11 Handler::parseURL @ bci:137 idom[16] 493 IfTrue === 489 [[ 931 216 ]] #1 idom[17] 489 If === 763 764 [[ 493 204 ]] P=0.999999, C=-1.000000 !jvms: String::regionMatches @ bci:21 Handler::parseURL @ bci:75 idom[18] 763 Parm === 110 [[ 489 ]] Control !jvms: Handler::parseURL @ bci:88 From mdoerr at openjdk.java.net Fri Oct 23 10:42:49 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 23 Oct 2020 10:42:49 GMT Subject: RFR: 8255274: [PPC64, s390] wrong StringLatin1.indexOf version matched [v2] In-Reply-To: <2lYn7oWZsxWQECjxx87FaTqyxpJknxSOeTV6qMqR7cc=.f8d911fd-a5ae-444b-90f8-4331d3b1418c@github.com> References: <2lYn7oWZsxWQECjxx87FaTqyxpJknxSOeTV6qMqR7cc=.f8d911fd-a5ae-444b-90f8-4331d3b1418c@github.com> Message-ID: > PPC64 and s390 currently match indexOfChar_U also for StrIntrinsicNode::L. This leads to incorrect results of StringLatin1.indexOf and alreads breaks builds: > Optimizing the exploded image > Error occurred during initialization of boot layer > > We need separate match rules for StrIntrinsicNode::U and StrIntrinsicNode::L. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: adapt format strings ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/820/files - new: https://git.openjdk.java.net/jdk/pull/820/files/b051bd40..186144e5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=820&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=820&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/820.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/820/head:pull/820 PR: https://git.openjdk.java.net/jdk/pull/820 From mdoerr at openjdk.java.net Fri Oct 23 10:42:50 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 23 Oct 2020 10:42:50 GMT Subject: RFR: 8255274: [PPC64, s390] wrong StringLatin1.indexOf version matched [v2] In-Reply-To: <-xLmclNToXEZ6K3CRLiTdfi9pRdTmNsChuRB9-ryMvM=.e3a6ce78-c3e2-4659-aa52-422d37a1e0ef@github.com> References: <2lYn7oWZsxWQECjxx87FaTqyxpJknxSOeTV6qMqR7cc=.f8d911fd-a5ae-444b-90f8-4331d3b1418c@github.com> <-xLmclNToXEZ6K3CRLiTdfi9pRdTmNsChuRB9-ryMvM=.e3a6ce78-c3e2-4659-aa52-422d37a1e0ef@github.com> Message-ID: On Fri, 23 Oct 2020 10:26:59 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> adapt format strings > > Ok, I see. Would be a worthwhile enhancement to preserve information that allows for having just one instruction form. > You might want to change to format string to indicate which version is used. I leave it to your discretion. > Thanks again for fixing. Thanks. I've adapted the format strings. ------------- PR: https://git.openjdk.java.net/jdk/pull/820 From mdoerr at openjdk.java.net Fri Oct 23 10:54:37 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 23 Oct 2020 10:54:37 GMT Subject: Integrated: 8255274: [PPC64, s390] wrong StringLatin1.indexOf version matched In-Reply-To: <2lYn7oWZsxWQECjxx87FaTqyxpJknxSOeTV6qMqR7cc=.f8d911fd-a5ae-444b-90f8-4331d3b1418c@github.com> References: <2lYn7oWZsxWQECjxx87FaTqyxpJknxSOeTV6qMqR7cc=.f8d911fd-a5ae-444b-90f8-4331d3b1418c@github.com> Message-ID: <6dTlK22SU5MD4kBOZ0lUrDg7Ih1K1uG6Tj4ChpNzfsg=.ccd1e7e7-dae1-41a8-9248-49f432541ded@github.com> On Thu, 22 Oct 2020 21:56:54 GMT, Martin Doerr wrote: > PPC64 and s390 currently match indexOfChar_U also for StrIntrinsicNode::L. This leads to incorrect results of StringLatin1.indexOf and alreads breaks builds: > Optimizing the exploded image > Error occurred during initialization of boot layer > > We need separate match rules for StrIntrinsicNode::U and StrIntrinsicNode::L. This pull request has now been integrated. Changeset: df792573 Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/df792573 Stats: 38 lines in 2 files changed: 36 ins; 0 del; 2 mod 8255274: [PPC64, s390] wrong StringLatin1.indexOf version matched Reviewed-by: clanger, rrich ------------- PR: https://git.openjdk.java.net/jdk/pull/820 From mdoerr at openjdk.java.net Fri Oct 23 10:57:36 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 23 Oct 2020 10:57:36 GMT Subject: Integrated: 8255129: [PPC64, s390] Check vector_size_supported and add VectorReinterpret node In-Reply-To: References: Message-ID: On Wed, 21 Oct 2020 13:28:38 GMT, Martin Doerr wrote: > match_rule_supported_vector on PPC64 and s390 need to check vector_size_supported. > In addition, an implementation for VectorReinterpret is needed. It can get implemented empty when using src = dst register. Note that these 2 platforms support only one vector size at a time, so there's no need for move between different sizes. > > I'd like to clean up match_rule_supported, too. Cases for which we return true don't need to get checked explicitely because true is default. And we don't need to check SpecialString... flags because they are handled by "disabled_by_jvm_flags". So they can still get disabled (e.g. by jdk/bin/java -XX:-TieredCompilation -XX:-SpecialStringIndexOf -XX:+PrintCompilation -XX:+PrintInlining TestString|grep StringLatin1::indexOf). This pull request has now been integrated. Changeset: 9007bc20 Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/9007bc20 Stats: 158 lines in 2 files changed: 29 ins; 56 del; 73 mod 8255129: [PPC64, s390] Check vector_size_supported and add VectorReinterpret node Reviewed-by: lucy ------------- PR: https://git.openjdk.java.net/jdk/pull/783 From vlivanov at openjdk.java.net Fri Oct 23 11:10:51 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 23 Oct 2020 11:10:51 GMT Subject: RFR: 8253734: C2: Optimize Move nodes [v2] In-Reply-To: References: Message-ID: > Introduce the following transformations for Move nodes: > 1. `MoveI2F (MoveF2I x) => x` > > 1. `MoveI2F (LoadI mem) => LoadF mem` > > 1. `StoreI mem (MoveF2I x) => StoreF mem x` > > (The same applies to MoveL2D/MoveD2L.) > > ?1 eliminates redundant operations and ?2/?3 avoid reg-to-reg moves in generated code: > 0x000000010d09964c: vmovss 0x20(%rsi),%xmm1 > 0x000000010d099651: vmovd %xmm1,%eax ;*invokestatic floatToRawIntBits > vs > 0x0000000110c5a6cc: mov 0x20(%rsi),%eax ;*invokestatic floatToRawIntBits > > > (?2 and ?3 are performed late (after loop opts are over) to avoid high-level optimizations passes to handle newly introduced mismatched accesses.) > > Testing: tier1-5. Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Address review comments from Tobias. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/826/files - new: https://git.openjdk.java.net/jdk/pull/826/files/567c4fb7..878d44bf Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=826&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=826&range=00-01 Stats: 78 lines in 4 files changed: 16 ins; 26 del; 36 mod Patch: https://git.openjdk.java.net/jdk/pull/826.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/826/head:pull/826 PR: https://git.openjdk.java.net/jdk/pull/826 From vlivanov at openjdk.java.net Fri Oct 23 11:10:51 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 23 Oct 2020 11:10:51 GMT Subject: RFR: 8253734: C2: Optimize Move nodes [v2] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 09:20:19 GMT, Tobias Hartmann wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments from Tobias. > > Maybe add some comments with examples to the reinterpret methods. Thanks for the quick feedback, Tobias! Please, take a look at the updated version. ------------- PR: https://git.openjdk.java.net/jdk/pull/826 From redestad at openjdk.java.net Fri Oct 23 11:30:50 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 23 Oct 2020 11:30:50 GMT Subject: RFR: 8255338: CodeSections are never frozen Message-ID: CodeSections are never actually frozen, so code for freezing can be removed along with some assertions and guarantees. A few other minor cleanups, too. ------------- Commit messages: - CodeSections are never frozen Changes: https://git.openjdk.java.net/jdk/pull/834/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=834&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255338 Stats: 79 lines in 3 files changed: 0 ins; 77 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/834.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/834/head:pull/834 PR: https://git.openjdk.java.net/jdk/pull/834 From jbhateja at openjdk.java.net Fri Oct 23 12:05:49 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 23 Oct 2020 12:05:49 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v10] In-Reply-To: References: Message-ID: <8Gz1zaDPTixfCBIfO8-_CxIXUNvweZkas3FZq0voghA=.ee27fdf6-49a2-4e88-8ecf-a07d86414a9f@github.com> > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - JDK-8252848 : Review comments resolution. - Merge remote-tracking branch 'upstream' into JDK-8252848 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - Replacing explicit type checks with existing type checking routines - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions. ------------- Changes: https://git.openjdk.java.net/jdk/pull/302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=09 Stats: 549 lines in 27 files changed: 499 ins; 23 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Fri Oct 23 12:05:51 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 23 Oct 2020 12:05:51 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v9] In-Reply-To: <8Ryyxuf5P2D6WNyj4riYCTgN0U6WLrLpBmxhNbnmPpQ=.b2ed5660-99d0-49d1-83e0-8b2de518d7b8@github.com> References: <8Ryyxuf5P2D6WNyj4riYCTgN0U6WLrLpBmxhNbnmPpQ=.b2ed5660-99d0-49d1-83e0-8b2de518d7b8@github.com> Message-ID: On Thu, 22 Oct 2020 09:48:51 GMT, Ningsheng Jian wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > src/hotspot/share/opto/vectornode.hpp line 835: > >> 833: static VectorMaskGenNode* make(int opc, Node* src, const Type* ty, const Type* ety); >> 834: private: >> 835: const Type* _elemType; > > Will an additional field in the node valid after some optimizations, i.e. clone()? I think I know the ety, but I don't know the usage of ty. If so, do you need to have a new type like what TypeVect does for mask? As currently there is no support for mask registers in RA, for X86 long ideal type is sufficient for a mask producing node (def operand is a mask register) ; But for complete support returning Op_RegVMask as an ideal_reg() type for masked Ideal node should do the trick without creating an explicit new ideal Type for mask generating nodes. Spill sizes and number of slots may be different for X86 and ARM (SVE). Shallow copy during Node::clone should be sufficient here since encapsulated element type will be preserved. > src/hotspot/share/opto/vectornode.cpp line 775: > >> 773: VectorMaskGenNode* make(int opc, Node* src, const Type* ty, const Type* ety) { >> 774: return new VectorMaskGenNode(src, ty, ety); >> 775: } > > These are not used? This is a just a helper routine not used currently though. > src/hotspot/share/opto/vectornode.hpp line 826: > >> 824: class VectorMaskGenNode : public TypeNode { >> 825: public: >> 826: VectorMaskGenNode(Node* src, const Type* ty, const Type* ety): TypeNode(ty, 2), _elemType(ety) { > > Sorry, I don't quite understand the arguments here. What does 'src' mean to the mask? ty -> Node type , long in this case since for X86 mask register is 64 bit wide. ety -> Mask element type, currently used during LoadVectorMasked/StoreVectorMasked idealization to compute the block sizes for constant masks and replace masked vector operations with non-masked if block size is equal to vector size. Src has been replaced by a better name "length" used for mask computation. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From thartmann at openjdk.java.net Fri Oct 23 12:09:39 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 23 Oct 2020 12:09:39 GMT Subject: RFR: 8253734: C2: Optimize Move nodes [v2] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 11:10:51 GMT, Vladimir Ivanov wrote: >> Introduce the following transformations for Move nodes: >> 1. `MoveI2F (MoveF2I x) => x` >> >> 1. `MoveI2F (LoadI mem) => LoadF mem` >> >> 1. `StoreI mem (MoveF2I x) => StoreF mem x` >> >> (The same applies to MoveL2D/MoveD2L.) >> >> ?1 eliminates redundant operations and ?2/?3 avoid reg-to-reg moves in generated code: >> 0x000000010d09964c: vmovss 0x20(%rsi),%xmm1 >> 0x000000010d099651: vmovd %xmm1,%eax ;*invokestatic floatToRawIntBits >> vs >> 0x0000000110c5a6cc: mov 0x20(%rsi),%eax ;*invokestatic floatToRawIntBits >> >> >> (?2 and ?3 are performed late (after loop opts are over) to avoid high-level optimizations passes to handle newly introduced mismatched accesses.) >> >> Testing: tier1-5. > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments from Tobias. Looks good to me. Thanks for making these changes! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/826 From neliasso at openjdk.java.net Fri Oct 23 12:32:36 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 23 Oct 2020 12:32:36 GMT Subject: RFR: 8255338: CodeSections are never frozen In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 11:25:24 GMT, Claes Redestad wrote: > CodeSections are never actually frozen, so code for freezing can be removed along with some assertions and guarantees. > > A few other minor cleanups, too. Looks good. Remove the additional methods I sent, if you like. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/834 From redestad at openjdk.java.net Fri Oct 23 12:41:48 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 23 Oct 2020 12:41:48 GMT Subject: RFR: 8255338: CodeSections are never frozen [v2] In-Reply-To: References: Message-ID: > CodeSections are never actually frozen, so code for freezing can be removed along with some assertions and guarantees. > > A few other minor cleanups, too. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Remove insts_limit() and clear_insts_mark(), too ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/834/files - new: https://git.openjdk.java.net/jdk/pull/834/files/37aca0c2..dfb19e95 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=834&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=834&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/834.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/834/head:pull/834 PR: https://git.openjdk.java.net/jdk/pull/834 From redestad at openjdk.java.net Fri Oct 23 12:41:49 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 23 Oct 2020 12:41:49 GMT Subject: RFR: 8255338: CodeSections are never frozen [v2] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 12:29:59 GMT, Nils Eliasson wrote: > Looks good. Thanks! > Remove the additional methods I sent, if you like. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/834 From neliasso at openjdk.java.net Fri Oct 23 12:42:40 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 23 Oct 2020 12:42:40 GMT Subject: RFR: 8253734: C2: Optimize Move nodes [v2] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 11:10:51 GMT, Vladimir Ivanov wrote: >> Introduce the following transformations for Move nodes: >> 1. `MoveI2F (MoveF2I x) => x` >> >> 1. `MoveI2F (LoadI mem) => LoadF mem` >> >> 1. `StoreI mem (MoveF2I x) => StoreF mem x` >> >> (The same applies to MoveL2D/MoveD2L.) >> >> ?1 eliminates redundant operations and ?2/?3 avoid reg-to-reg moves in generated code: >> 0x000000010d09964c: vmovss 0x20(%rsi),%xmm1 >> 0x000000010d099651: vmovd %xmm1,%eax ;*invokestatic floatToRawIntBits >> vs >> 0x0000000110c5a6cc: mov 0x20(%rsi),%eax ;*invokestatic floatToRawIntBits >> >> >> (?2 and ?3 are performed late (after loop opts are over) to avoid high-level optimizations passes to handle newly introduced mismatched accesses.) >> >> Testing: tier1-5. > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments from Tobias. Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/826 From fyang at openjdk.java.net Fri Oct 23 13:16:35 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Fri, 23 Oct 2020 13:16:35 GMT Subject: RFR: 8255287: aarch64: fix SVE patterns for vector shift count In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 10:05:31 GMT, Andrew Dinn wrote: >> SVE patterns for vector shift count cannot be matched due to bad matching rules. >> Also code gen is not correct in certain cases for vlslS_imm and vlsrS_imm. >> Please refer to JDK-8255287 for details. >> Patch passed tier1 tests using QEMU system emulator which supports SVE. > > These changes look ok. Thanks for the quick review :-) @adinn ------------- PR: https://git.openjdk.java.net/jdk/pull/827 From fyang at openjdk.java.net Fri Oct 23 13:21:41 2020 From: fyang at openjdk.java.net (Fei Yang) Date: Fri, 23 Oct 2020 13:21:41 GMT Subject: Integrated: 8255287: aarch64: fix SVE patterns for vector shift count In-Reply-To: References: Message-ID: <4JzgyHd1mK4S3onigtOC9s8nhwFaFTS0xpswG1rAyNM=.4192a337-a851-4f77-8d17-18f8bb3fbb56@github.com> On Fri, 23 Oct 2020 09:35:11 GMT, Fei Yang wrote: > SVE patterns for vector shift count cannot be matched due to bad matching rules. > Also code gen is not correct in certain cases for vlslS_imm and vlsrS_imm. > Please refer to JDK-8255287 for details. > Patch passed tier1 tests using QEMU system emulator which supports SVE. This pull request has now been integrated. Changeset: 5ec1b80c Author: Fei Yang URL: https://git.openjdk.java.net/jdk/commit/5ec1b80c Stats: 149 lines in 7 files changed: 109 ins; 0 del; 40 mod 8255287: aarch64: fix SVE patterns for vector shift count Co-authored-by: Yanhong Zhu Reviewed-by: adinn ------------- PR: https://git.openjdk.java.net/jdk/pull/827 From chagedorn at openjdk.java.net Fri Oct 23 13:22:45 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 23 Oct 2020 13:22:45 GMT Subject: RFR: 8255245: C1: Fix output of -XX:+PrintCFGToFile to open it with visualizer Message-ID: [JDK-8251093](https://bugs.openjdk.java.net/browse/JDK-8251093) introduced some improved logging for intervals. When specifying `-XX:+PrintCFGToFile` to dump the graph to a file to later open it with the C1 visualizer, it also uses the improved interval printing. However, this output can no longer be read by the C1 Visualizer. As the C1 Visualizer is not part of the JDK, we should include the old format again for the output produced by `-XX:+PrintCFGToFile` to be compatible with the visualizer again. The console output can still use the improved logging of JDK-8251093. ------------- Commit messages: - 8255245: C1: Fix output of -XX:+PrintCFGToFile to open it with visualizer Changes: https://git.openjdk.java.net/jdk/pull/837/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=837&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255245 Stats: 38 lines in 3 files changed: 26 ins; 2 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/837.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/837/head:pull/837 PR: https://git.openjdk.java.net/jdk/pull/837 From eric.caspole at oracle.com Fri Oct 23 13:39:53 2020 From: eric.caspole at oracle.com (eric.caspole at oracle.com) Date: Fri, 23 Oct 2020 09:39:53 -0400 Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: <40989e6f-1888-da69-e3f5-587588327f68@oracle.com> Andriy, OK thanks for the report. Eric On 10/22/20 4:05 AM, Andriy Plokhotnyuk wrote: > On Wed, 21 Oct 2020 20:49:06 GMT, Vladimir Kozlov wrote: > >>> We have seen some specific benefits to increasing InlineSmallCode to 2500 from 2000, and across the whole promo build perf test collection the change is neutral to slightly positive, where the tests are run on modern OCI systems. >>> >>> Passed tier1 testing, some ad-hoc perf testing and more compiler related parts of the weekly promo performance set. >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8254913 >>> Thanks, >>> Eric >> >> Good. > > I'm got ~10% regression after adding `-XX:InlineSmallCode=2500` in some JSON parsing benchmarks where huge methods were generated for some data structures with big number of fields. > > Steps to reproduce: > 1. Clone the jsoniter-scala repo: > git clone --depth 1 git at github.com:plokhotnyuk/jsoniter-scala.git > 2. Run benchmarks with default options: > sbt -java-home /usr/lib/jvm/openjdk-16 'jsoniter-scala-benchmarkJVM/jmh:run -wi 10 -i 10 TwitterAPIReading.jsoniterScala' > 3. Run benchmarks with an additional `-XX:InlineSmallCode=2500` option: > sbt -java-home /usr/lib/jvm/openjdk-16 'jsoniter-scala-benchmarkJVM/jmh:run -wi 10 -i 10 -jvmArgsAppend "-XX:InlineSmallCode=2500" TwitterAPIReading.jsoniterScala' > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/705 > From martin.doerr at sap.com Fri Oct 23 15:48:06 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 23 Oct 2020 15:48:06 +0000 Subject: Vector API: stack overflow on Big Endian In-Reply-To: <6B33BFE4-351B-4CB4-A72B-94741CF135C6@oracle.com> References: <08399669-DCB9-4CB4-A23F-A6B64439D1DA@oracle.com> <6B33BFE4-351B-4CB4-A72B-94741CF135C6@oracle.com> Message-ID: Hi Paul, thanks for pointing me to the right place. I've filed https://bugs.openjdk.java.net/browse/JDK-8255349 and created https://github.com/openjdk/jdk/pull/840 . Feel free to modify/write comments. Thanks for your help and best regards, Martin > -----Original Message----- > From: Paul Sandoz > Sent: Freitag, 23. Oktober 2020 00:18 > To: Doerr, Martin > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: Vector API: stack overflow on Big Endian > > Thanks, that is helpful, so it's only re-bracket tests for lanewise conversion > that fail. > > This reinforces my suspicions that the actual output is fine, and the test > assumes LE encoding in castByteArrayData, which I think takes a short cut for > casting just for +ve integral types. Lanewise conversion skips floating point > types and so is only partially tested. > > The method castByteArrayData needs to be reimplemented loading into a > ByteBuffer, reading values, casting them, and storing them into a resulting > buffer. Not difficult to do, just a little laborious. > > Paul. > > > On Oct 22, 2020, at 11:36 AM, Doerr, Martin > wrote: > > > > Hi Paul, > > > > here's the snippet from the > JTwork/jdk/incubator/vector/VectorReshapeTests.jtr from a run on an s390 > machine. > > Let me know if you need anything else. > > > > Thanks and best regards, > > Martin From mdoerr at openjdk.java.net Fri Oct 23 15:52:47 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 23 Oct 2020 15:52:47 GMT Subject: RFR: 8255349: Vector API issues on Big Endian Message-ID: Several jdk/incubator/vector tests are failing with stack overflow due to endless recursion on Big Endian platforms. E.g. Int64VectorLoadStoreTests (see bug for stack trace). Endianess in defaultReinterpret is currently hard coded and not checked. In addition, VectorReshapeTests.java is failing due to incorrect size conversion for Big Endian in the test code (castByteArrayData). ------------- Commit messages: - 8255349: Vector API issues on Big Endian Changes: https://git.openjdk.java.net/jdk/pull/840/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=840&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255349 Stats: 30 lines in 2 files changed: 23 ins; 2 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/840.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/840/head:pull/840 PR: https://git.openjdk.java.net/jdk/pull/840 From kvn at openjdk.java.net Fri Oct 23 16:36:38 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 23 Oct 2020 16:36:38 GMT Subject: RFR: 8255265: IdealLoopTree::iteration_split_impl does not use should_align In-Reply-To: References: Message-ID: <1xtCqi_D0DIki66IjUyFktLsRfnq1Dn-BMS3q4bVVyA=.d47df8ed-7992-4391-9c07-a4a5e57821c7@github.com> On Thu, 22 Oct 2020 18:00:28 GMT, Aleksey Shipilev wrote: > There is a TODO item in IdealLoopTree::iteration_split_impl: > > // TODO: Remove align -- not used. > bool should_align = policy_align(phase); > > Indeed, it is not used, because policy_align always returns FALSE. Can clean up some of the code around too. Good. Yes, it is never was used. We do try to aligned one of array access during vectorization: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L3417 ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/815 From kvn at openjdk.java.net Fri Oct 23 16:59:38 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 23 Oct 2020 16:59:38 GMT Subject: RFR: 8253734: C2: Optimize Move nodes [v2] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 11:10:51 GMT, Vladimir Ivanov wrote: >> Introduce the following transformations for Move nodes: >> 1. `MoveI2F (MoveF2I x) => x` >> >> 1. `MoveI2F (LoadI mem) => LoadF mem` >> >> 1. `StoreI mem (MoveF2I x) => StoreF mem x` >> >> (The same applies to MoveL2D/MoveD2L.) >> >> ?1 eliminates redundant operations and ?2/?3 avoid reg-to-reg moves in generated code: >> 0x000000010d09964c: vmovss 0x20(%rsi),%xmm1 >> 0x000000010d099651: vmovd %xmm1,%eax ;*invokestatic floatToRawIntBits >> vs >> 0x0000000110c5a6cc: mov 0x20(%rsi),%eax ;*invokestatic floatToRawIntBits >> >> >> (?2 and ?3 are performed late (after loop opts are over) to avoid high-level optimizations passes to handle newly introduced mismatched accesses.) >> >> Testing: tier1-5. > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments from Tobias. Proposed optimization looks fine. In what cases you see the issues? ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/826 From kvn at openjdk.java.net Fri Oct 23 17:04:39 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 23 Oct 2020 17:04:39 GMT Subject: RFR: 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle [v2] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 09:03:53 GMT, Aleksey Shipilev wrote: >> There is a TODO item in their `get_vm_target`-s. We can clean those up. It looks to me the caller code does not handle `NULL` result well, which means we better `fatal` ourselves before exposing `NULL` to callers and `SEGV`-ing there. > > Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: > > - Remove whitespace > - Simplify the whole thing > - Simplify, because MemberName::vmtarget already returns Method* LGTM too. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/825 From kvn at openjdk.java.net Fri Oct 23 17:26:41 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 23 Oct 2020 17:26:41 GMT Subject: RFR: 8255338: CodeSections are never frozen [v2] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 12:41:48 GMT, Claes Redestad wrote: >> CodeSections are never actually frozen, so code for freezing can be removed along with some assertions and guarantees. >> >> A few other minor cleanups, too. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Remove insts_limit() and clear_insts_mark(), too Good. It was part of original implementation of CodeBuffer. It was used for very short time in C1. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/834 From kvn at openjdk.java.net Fri Oct 23 17:30:36 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 23 Oct 2020 17:30:36 GMT Subject: RFR: 8255245: C1: Fix output of -XX:+PrintCFGToFile to open it with visualizer In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 13:16:27 GMT, Christian Hagedorn wrote: > [JDK-8251093](https://bugs.openjdk.java.net/browse/JDK-8251093) introduced some improved logging for intervals. When specifying `-XX:+PrintCFGToFile` to dump the graph to a file to later open it with the C1 visualizer, it also uses the improved interval printing. However, this output can no longer be read by the C1 Visualizer. As the C1 Visualizer is not part of the JDK, we should include the old format again for the output produced by `-XX:+PrintCFGToFile` to be compatible with the visualizer again. The console output can still use the improved logging of JDK-8251093. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/837 From paul.sandoz at oracle.com Fri Oct 23 17:48:36 2020 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 23 Oct 2020 10:48:36 -0700 Subject: Vector API: stack overflow on Big Endian In-Reply-To: References: <08399669-DCB9-4CB4-A23F-A6B64439D1DA@oracle.com> <6B33BFE4-351B-4CB4-A72B-94741CF135C6@oracle.com> Message-ID: Thanks for doing this, you beat me too it :-) Paul. > On Oct 23, 2020, at 8:48 AM, Doerr, Martin wrote: > > Hi Paul, > > thanks for pointing me to the right place. > > I've filed > https://bugs.openjdk.java.net/browse/JDK-8255349 > and created > https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/840__;!!GqivPVa7Brio!ID5CKSZbcK-2pXH_x0mse1TjehNF8MxTG9NfnepDPUZ_Q5Rers7lXnb9yE-JOHYZLw$ > . > > Feel free to modify/write comments. > > Thanks for your help and best regards, > Martin > > >> -----Original Message----- >> From: Paul Sandoz >> Sent: Freitag, 23. Oktober 2020 00:18 >> To: Doerr, Martin >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: Vector API: stack overflow on Big Endian >> >> Thanks, that is helpful, so it's only re-bracket tests for lanewise conversion >> that fail. >> >> This reinforces my suspicions that the actual output is fine, and the test >> assumes LE encoding in castByteArrayData, which I think takes a short cut for >> casting just for +ve integral types. Lanewise conversion skips floating point >> types and so is only partially tested. >> >> The method castByteArrayData needs to be reimplemented loading into a >> ByteBuffer, reading values, casting them, and storing them into a resulting >> buffer. Not difficult to do, just a little laborious. >> >> Paul. >> >>> On Oct 22, 2020, at 11:36 AM, Doerr, Martin >> wrote: >>> >>> Hi Paul, >>> >>> here's the snippet from the >> JTwork/jdk/incubator/vector/VectorReshapeTests.jtr from a run on an s390 >> machine. >>> Let me know if you need anything else. >>> >>> Thanks and best regards, >>> Martin > From psandoz at openjdk.java.net Fri Oct 23 17:49:38 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Fri, 23 Oct 2020 17:49:38 GMT Subject: RFR: 8255349: Vector API issues on Big Endian In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 15:46:07 GMT, Martin Doerr wrote: > Several jdk/incubator/vector tests are failing with stack overflow due to endless recursion on Big Endian platforms. E.g. Int64VectorLoadStoreTests (see bug for stack trace). > Endianess in defaultReinterpret is currently hard coded and not checked. > > In addition, VectorReshapeTests.java is failing due to incorrect size conversion for Big Endian in the test code (castByteArrayData). Marked as reviewed by psandoz (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/840 From redestad at openjdk.java.net Fri Oct 23 17:51:38 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 23 Oct 2020 17:51:38 GMT Subject: RFR: 8255338: CodeSections are never frozen [v2] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 17:23:52 GMT, Vladimir Kozlov wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove insts_limit() and clear_insts_mark(), too > > Good. It was part of original implementation of CodeBuffer. It was used for very short time in C1. @neliasso @vnkozlov - thank you for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/834 From redestad at openjdk.java.net Fri Oct 23 17:51:39 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 23 Oct 2020 17:51:39 GMT Subject: Integrated: 8255338: CodeSections are never frozen In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 11:25:24 GMT, Claes Redestad wrote: > CodeSections are never actually frozen, so code for freezing can be removed along with some assertions and guarantees. > > A few other minor cleanups, too. This pull request has now been integrated. Changeset: 185c8bcf Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/185c8bcf Stats: 81 lines in 3 files changed: 0 ins; 79 del; 2 mod 8255338: CodeSections are never frozen Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/834 From vladimir.x.ivanov at oracle.com Fri Oct 23 18:17:29 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 23 Oct 2020 21:17:29 +0300 Subject: RFR: 8253734: C2: Optimize Move nodes [v2] In-Reply-To: References: Message-ID: Thanks for the reviews, Vladimir and Tobias. > Proposed optimization looks fine. In what cases you see the issues? It was motivated by some microbenchmarks targeting Memory Access API: https://mail.openjdk.java.net/pipermail/panama-dev/2020-September/010873.html Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Fri Oct 23 18:21:33 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 23 Oct 2020 11:21:33 -0700 Subject: RFR: 8253734: C2: Optimize Move nodes [v2] In-Reply-To: References: Message-ID: <4e01fed7-e8e3-123e-108b-febabdc6969e@oracle.com> On 10/23/20 11:17 AM, Vladimir Ivanov wrote: > Thanks for the reviews, Vladimir and Tobias. > >> Proposed optimization looks fine. In what cases you see the issues? > > It was motivated by some microbenchmarks targeting Memory Access API: > > > https://mail.openjdk.java.net/pipermail/panama-dev/2020-September/010873.html Okay. Thanks, Vladimir K > > Best regards, > Vladimir Ivanov From neliasso at openjdk.java.net Fri Oct 23 19:38:36 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 23 Oct 2020 19:38:36 GMT Subject: RFR: 8255245: C1: Fix output of -XX:+PrintCFGToFile to open it with visualizer In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 13:16:27 GMT, Christian Hagedorn wrote: > [JDK-8251093](https://bugs.openjdk.java.net/browse/JDK-8251093) introduced some improved logging for intervals. When specifying `-XX:+PrintCFGToFile` to dump the graph to a file to later open it with the C1 visualizer, it also uses the improved interval printing. However, this output can no longer be read by the C1 Visualizer. As the C1 Visualizer is not part of the JDK, we should include the old format again for the output produced by `-XX:+PrintCFGToFile` to be compatible with the visualizer again. The console output can still use the improved logging of JDK-8251093. src/hotspot/share/c1/c1_LinearScan.cpp line 4607: > 4605: > 4606: #ifndef PRODUCT > 4607: void Interval::print_on(outputStream* out) const { I suggest moving the impl of print_on to the declaration in the hpp-file. That will make it easier to see at a glance how the different print methods delegate to each other. ------------- PR: https://git.openjdk.java.net/jdk/pull/837 From github.com+68959564+blaquez at openjdk.java.net Fri Oct 23 21:31:35 2020 From: github.com+68959564+blaquez at openjdk.java.net (blaquez) Date: Fri, 23 Oct 2020 21:31:35 GMT Subject: RFR: 8253734: C2: Optimize Move nodes [v2] In-Reply-To: References: Message-ID: <_cWKdx6iGdnfg0BzA7NvYlgUgx911E3K3VYB0mz9adk=.8b96e3be-b707-4cb0-919e-39a50829676a@github.com> On Fri, 23 Oct 2020 11:10:51 GMT, Vladimir Ivanov wrote: >> Introduce the following transformations for Move nodes: >> 1. `MoveI2F (MoveF2I x) => x` >> >> 1. `MoveI2F (LoadI mem) => LoadF mem` >> >> 1. `StoreI mem (MoveF2I x) => StoreF mem x` >> >> (The same applies to MoveL2D/MoveD2L.) >> >> ?1 eliminates redundant operations and ?2/?3 avoid reg-to-reg moves in generated code: >> 0x000000010d09964c: vmovss 0x20(%rsi),%xmm1 >> 0x000000010d099651: vmovd %xmm1,%eax ;*invokestatic floatToRawIntBits >> vs >> 0x0000000110c5a6cc: mov 0x20(%rsi),%eax ;*invokestatic floatToRawIntBits >> >> >> (?2 and ?3 are performed late (after loop opts are over) to avoid high-level optimizations passes to handle newly introduced mismatched accesses.) >> >> Testing: tier1-5. > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments from Tobias. Marked as reviewed by blaquez at github.com (no known OpenJDK username). Marked as reviewed by blaquez at github.com (no known OpenJDK username). ------------- PR: https://git.openjdk.java.net/jdk/pull/826 From github.com+68959564+blaquez at openjdk.java.net Sat Oct 24 04:45:36 2020 From: github.com+68959564+blaquez at openjdk.java.net (blaquez) Date: Sat, 24 Oct 2020 04:45:36 GMT Subject: RFR: 8253734: C2: Optimize Move nodes [v2] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 11:10:51 GMT, Vladimir Ivanov wrote: >> Introduce the following transformations for Move nodes: >> 1. `MoveI2F (MoveF2I x) => x` >> >> 1. `MoveI2F (LoadI mem) => LoadF mem` >> >> 1. `StoreI mem (MoveF2I x) => StoreF mem x` >> >> (The same applies to MoveL2D/MoveD2L.) >> >> ?1 eliminates redundant operations and ?2/?3 avoid reg-to-reg moves in generated code: >> 0x000000010d09964c: vmovss 0x20(%rsi),%xmm1 >> 0x000000010d099651: vmovd %xmm1,%eax ;*invokestatic floatToRawIntBits >> vs >> 0x0000000110c5a6cc: mov 0x20(%rsi),%eax ;*invokestatic floatToRawIntBits >> >> >> (?2 and ?3 are performed late (after loop opts are over) to avoid high-level optimizations passes to handle newly introduced mismatched accesses.) >> >> Testing: tier1-5. > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments from Tobias. Marked as reviewed by blaquez at github.com (no known OpenJDK username). Marked as reviewed by blaquez at github.com (no known OpenJDK username). ------------- PR: https://git.openjdk.java.net/jdk/pull/826 From aph at redhat.com Sat Oct 24 10:49:16 2020 From: aph at redhat.com (Andrew Haley) Date: Sat, 24 Oct 2020 11:49:16 +0100 Subject: RFR: 8254913: Increase InlineSmallCode default from 2000 to 2500 for x64 In-Reply-To: References: <-eTclEV5zK_DIJgV8kYWFRBlUNtEqzq5H_fNwmbVJ7A=.ee694b8b-2a1e-43a0-9df6-52958e8f1d23@github.com> Message-ID: On 21/10/2020 20:45, Azeem Jiva wrote: > Should this be backported to 11 then? Probably not. It's not the sort of thing that should be backported to a release without convincing reproduceable evidence. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dongbohe at openjdk.java.net Sun Oct 25 06:49:43 2020 From: dongbohe at openjdk.java.net (Dongbo He) Date: Sun, 25 Oct 2020 06:49:43 GMT Subject: RFR: 8253623: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on Message-ID: 8253623: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on ------------- Commit messages: - 8253623: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on Changes: https://git.openjdk.java.net/jdk/pull/853/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=853&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253623 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/853.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/853/head:pull/853 PR: https://git.openjdk.java.net/jdk/pull/853 From jiefu at openjdk.java.net Sun Oct 25 08:39:41 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 25 Oct 2020 08:39:41 GMT Subject: RFR: 8255378: [Vector API] Remove redundant vector length check after JDK-8254814 and JDK-8255210 Message-ID: Hi all, As @iwanowww pointed out [1] that there are redundant vector length checks for reductionI [2] and reductionS [3]. It would be better to remove them. Testing: - jdk/incubator/vector on both AVX512 and AVX256 machines Thanks. Best regards, Jie [1] https://github.com/openjdk/jdk/pull/791#discussion_r510687005 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L4429 [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L4625 ------------- Commit messages: - 8255378: [Vector API] Remove redundant vector length check after JDK-8254814 and JDK-8255210 Changes: https://git.openjdk.java.net/jdk/pull/854/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=854&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255378 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/854.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/854/head:pull/854 PR: https://git.openjdk.java.net/jdk/pull/854 From dcubed at openjdk.java.net Sun Oct 25 14:48:41 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Sun, 25 Oct 2020 14:48:41 GMT Subject: RFR: 8255379: ProblemList compiler/loopstripmining/BackedgeNodeWithOutOfLoopControl.java Message-ID: <5iov9or-EITwvUpIV-nq3Jc_BI1zbj5RpQEB0RGCg2A=.48061cf4-9a52-4cd8-836c-7d464ea4d4b5@github.com> A trivial fix to ProblemList compiler/loopstripmining/BackedgeNodeWithOutOfLoopControl.java in order to reduce the noise in the JDK16 CI. ------------- Commit messages: - 8255379: ProblemList compiler/loopstripmining/BackedgeNodeWithOutOfLoopControl.java Changes: https://git.openjdk.java.net/jdk/pull/858/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=858&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255379 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/858.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/858/head:pull/858 PR: https://git.openjdk.java.net/jdk/pull/858 From alanb at openjdk.java.net Sun Oct 25 14:48:42 2020 From: alanb at openjdk.java.net (Alan Bateman) Date: Sun, 25 Oct 2020 14:48:42 GMT Subject: RFR: 8255379: ProblemList compiler/loopstripmining/BackedgeNodeWithOutOfLoopControl.java In-Reply-To: <5iov9or-EITwvUpIV-nq3Jc_BI1zbj5RpQEB0RGCg2A=.48061cf4-9a52-4cd8-836c-7d464ea4d4b5@github.com> References: <5iov9or-EITwvUpIV-nq3Jc_BI1zbj5RpQEB0RGCg2A=.48061cf4-9a52-4cd8-836c-7d464ea4d4b5@github.com> Message-ID: On Sun, 25 Oct 2020 14:41:10 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList compiler/loopstripmining/BackedgeNodeWithOutOfLoopControl.java > in order to reduce the noise in the JDK16 CI. Trivial exclude to reduce noise in tests. ------------- Marked as reviewed by alanb (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/858 From dcubed at openjdk.java.net Sun Oct 25 14:57:35 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Sun, 25 Oct 2020 14:57:35 GMT Subject: RFR: 8255379: ProblemList compiler/loopstripmining/BackedgeNodeWithOutOfLoopControl.java In-Reply-To: References: <5iov9or-EITwvUpIV-nq3Jc_BI1zbj5RpQEB0RGCg2A=.48061cf4-9a52-4cd8-836c-7d464ea4d4b5@github.com> Message-ID: On Sun, 25 Oct 2020 14:44:07 GMT, Alan Bateman wrote: >> A trivial fix to ProblemList compiler/loopstripmining/BackedgeNodeWithOutOfLoopControl.java >> in order to reduce the noise in the JDK16 CI. > > Trivial exclude to reduce noise in tests. Thanks for the fast review! ------------- PR: https://git.openjdk.java.net/jdk/pull/858 From dcubed at openjdk.java.net Sun Oct 25 14:57:36 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Sun, 25 Oct 2020 14:57:36 GMT Subject: Integrated: 8255379: ProblemList compiler/loopstripmining/BackedgeNodeWithOutOfLoopControl.java In-Reply-To: <5iov9or-EITwvUpIV-nq3Jc_BI1zbj5RpQEB0RGCg2A=.48061cf4-9a52-4cd8-836c-7d464ea4d4b5@github.com> References: <5iov9or-EITwvUpIV-nq3Jc_BI1zbj5RpQEB0RGCg2A=.48061cf4-9a52-4cd8-836c-7d464ea4d4b5@github.com> Message-ID: On Sun, 25 Oct 2020 14:41:10 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList compiler/loopstripmining/BackedgeNodeWithOutOfLoopControl.java > in order to reduce the noise in the JDK16 CI. This pull request has now been integrated. Changeset: 60d01424 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/60d01424 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8255379: ProblemList compiler/loopstripmining/BackedgeNodeWithOutOfLoopControl.java Reviewed-by: alanb ------------- PR: https://git.openjdk.java.net/jdk/pull/858 From kvn at openjdk.java.net Mon Oct 26 04:17:49 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 26 Oct 2020 04:17:49 GMT Subject: RFR: 8251994: VM crushed running TestComplexAddrExpr.java test with -XX:UseAVX=X Message-ID: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> To improve a chance to vectorize a loop, a special code in superword tries to hoist loads to the beginning of loop by replacing their memory input with corresponding (same memory slice) loop's memory Phi : https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L474 Originally loads are ordered by corresponding stores on the same memory slice. But when they are hoisted they loose that ordering - nothing enforce the order. In TestComplexAddrExpr.test6 case the ordering is preserved (luckily?) after hoisting only when vector size is 32 bytes (avx2) but they become unordered with 16 (avx=0 or avx1) or 64 (avx512) bytes vectors. The mystery of why the test did not fail in our teting environment is also solved! We have old Skylake machines (even my local machine) for which AVX is switched to avx2: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L721 I have simple fix (use original loads ordering indexes) but looking on the code which causing the issue I see that it is bogus/incomplete - it does not help cases listed for JDK-8076284 changes: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017645.html Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. Even if pack_parallel() method is able created packs they are currently all removed by filter_packs() method. And additionally the above cases from JDK-8076284 are vectorized without hoisting loads and pack_parallel - I verified it. The code added by JDK-8076284 is useless now and I am excluding ti. It needs more work to be useful. I reluctant to remove the code because may be in a future we will have time to invest into it. There were 2 way to exclude it: add new field in superword class: bool _do_vector_loop; // whether to do vectorization/simd style + bool _do_vector_loop_experimental; // experimental optimization bool _do_reserve_copy; // do reserve copy of the graph(loop) before final modification in output or use `#if VECTOR_LOOP_SIMD` as in current changes. I prefer this one to avoid wasting compilation time. I used first one for testing by setting `_do_vector_loop_experimental(UseNewCode)`. Testing: tier1-7 with avx2 and avx512. Performance testing - no regrression. I also compared jtreg tests output with -XX:+TraceNewVectors to verify that number of created vector nodes did not change. ------------- Commit messages: - 8251994: VM crushed running TestComplexAddrExpr.java test with -XX:UseAVX=X Changes: https://git.openjdk.java.net/jdk/pull/859/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=859&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8251994 Stats: 238 lines in 3 files changed: 233 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/859.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/859/head:pull/859 PR: https://git.openjdk.java.net/jdk/pull/859 From ngasson at openjdk.java.net Mon Oct 26 05:59:53 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 26 Oct 2020 05:59:53 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v3] In-Reply-To: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: > When using the Linux "perf" tool to do system profiling, symbol names of > running Java methods cannot be decoded, resulting in unhelpful output > such as: > > 10.52% [JIT] tid 236748 [.] 0x00007f6fdb75d223 > > Perf can read a simple text file format describing the mapping between > address ranges and symbol names for a particular process [1]. > > It's possible to generate this already for Java processes using a JVMTI > plugin such as perf-map-agent [2]. However this requires compiling > third-party code and then loading the agent into your Java process. It > would be more convenient if Hotspot could write this file directly using > a diagnostic command. The information required is almost identical to > that of the existing Compiler.codelist command. > > This patch adds a Compiler.perfmap diagnostic command on Linux only. To > use, first run "jcmd Compiler.perfmap" and then "perf top" or > "perf record" and the report should show decoded Java symbol names for > that process. > > As this just writes a snapshot of the code cache when the command is > run, it will become stale if methods are compiled later or unloaded. > However this shouldn't be a big problem in practice if the map file is > generated after the application has warmed up. > > [1] https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt > [2] https://github.com/jvm-profiling-tools/perf-map-agent Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: Add -XX:+DumpPerfMapAtExit option ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/760/files - new: https://git.openjdk.java.net/jdk/pull/760/files/a3cb0ed4..bd35433e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=760&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=760&range=01-02 Stats: 43 lines in 8 files changed: 29 ins; 7 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/760.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/760/head:pull/760 PR: https://git.openjdk.java.net/jdk/pull/760 From ngasson at openjdk.java.net Mon Oct 26 06:06:47 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 26 Oct 2020 06:06:47 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v4] In-Reply-To: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: > When using the Linux "perf" tool to do system profiling, symbol names of > running Java methods cannot be decoded, resulting in unhelpful output > such as: > > 10.52% [JIT] tid 236748 [.] 0x00007f6fdb75d223 > > Perf can read a simple text file format describing the mapping between > address ranges and symbol names for a particular process [1]. > > It's possible to generate this already for Java processes using a JVMTI > plugin such as perf-map-agent [2]. However this requires compiling > third-party code and then loading the agent into your Java process. It > would be more convenient if Hotspot could write this file directly using > a diagnostic command. The information required is almost identical to > that of the existing Compiler.codelist command. > > This patch adds a Compiler.perfmap diagnostic command on Linux only. To > use, first run "jcmd Compiler.perfmap" and then "perf top" or > "perf record" and the report should show decoded Java symbol names for > that process. > > As this just writes a snapshot of the code cache when the command is > run, it will become stale if methods are compiled later or unloaded. > However this shouldn't be a big problem in practice if the map file is > generated after the application has warmed up. > > [1] https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt > [2] https://github.com/jvm-profiling-tools/perf-map-agent Nick Gasson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge master - Add -XX:+DumpPerfMapAtExit option - Update for review comments - 8254723: add diagnostic command to write Linux perf map file When using the Linux "perf" tool to do system profiling, symbol names of running Java methods cannot be decoded, resulting in unhelpful output such as: 10.52% [JIT] tid 236748 [.] 0x00007f6fdb75d223 Perf can read a simple text file format describing the mapping between address ranges and symbol names for a particular process [1]. It's possible to generate this already for Java processes using a JVMTI plugin such as perf-map-agent [2]. However this requires compiling third-party code and then loading the agent into your Java process. It would be more convenient if Hotspot could write this file directly using a diagnostic command. The information required is almost identical to that of the existing Compiler.codelist command. This patch adds a Compiler.perfmap diagnostic command on Linux only. To use, first run "jcmd Compiler.perfmap" and then "perf top" or "perf record" and the report should show decoded Java symbol names for that process. As this just writes a snapshot of the code cache when the command is run, it will become stale if methods are compiled later or unloaded. However this shouldn't be a big problem in practice if the map file is generated after the application has warmed up. [1] https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt [2] https://github.com/jvm-profiling-tools/perf-map-agent ------------- Changes: https://git.openjdk.java.net/jdk/pull/760/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=760&range=03 Stats: 171 lines in 8 files changed: 169 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/760.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/760/head:pull/760 PR: https://git.openjdk.java.net/jdk/pull/760 From ngasson at openjdk.java.net Mon Oct 26 06:13:37 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 26 Oct 2020 06:13:37 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v2] In-Reply-To: References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: <9jhaf18gTpZmpz8wfp7x0lhvRl4DUOQdfbox2DzcExs=.df72b0d4-b121-482e-a6eb-214f216ac359@github.com> On Thu, 22 Oct 2020 07:09:47 GMT, Nick Gasson wrote: >> Hi Nick, >> >> this is a very useful idea! I missed this in the past. >> >> Some remarks. I'll try to keep bikeshedding to a minimum since you already have enough input. Mostly ergonomics. >> >> 1) Like Alexey, I would really wish for an print-at-exit switch. The common naming seems to be xxxAtExit (so not, OnExit). "PrintXxx" seems to be printing stuff out to tty, "DumpXxxx" for writing separate files (e.g. CDS map). So I would name it DumpPerfMapAtExit. >> >> 2) Dumping to /tmp is unexpected for me, I would prefer if the default were dumping to the current directory. That seems to be the default for other files too (cds map, hs-err file etc). >> >> 3) Not necessary but nice would be a an option to specify location of the dump file. >> >> 4) I think it would be nice to have these switches always available, so real product switches. Which would require you to write up a small CSR but I still think it would make sense. >> >> Cheers, Thomas > >> >> 1. Like Alexey, I would really wish for an print-at-exit switch. The common naming seems to be xxxAtExit (so not, OnExit). "PrintXxx" seems to be printing stuff out to tty, "DumpXxxx" for writing separate files (e.g. CDS map). So I would name it DumpPerfMapAtExit. >> > > OK, makes sense. > >> 2. Dumping to /tmp is unexpected for me, I would prefer if the default were dumping to the current directory. That seems to be the default for other files too (cds map, hs-err file etc). >> >> 3. Not necessary but nice would be a an option to specify location of the dump file. >> > > The `/tmp/perf-.map` is hardcoded into perf though ([see here](https://github.com/torvalds/linux/blob/master/tools/perf/util/map.c#L155)), so I don't think it's useful for the user to be able to change the location. > > I think we should use this option carefully because nmethod might be unloaded. So we should use this with `-XX:-UseCodeCacheFlushing`. > Thanks for the information. `-XX:+DumpPerfMapAtExit` will turn on `UseCodeCacheFlushing` if it's set to default. ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From ngasson at openjdk.java.net Mon Oct 26 06:13:40 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 26 Oct 2020 06:13:40 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v2] In-Reply-To: References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: On Wed, 21 Oct 2020 18:03:11 GMT, Chris Plummer wrote: >> Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Update for review comments > > src/hotspot/share/code/codeCache.cpp line 1582: > >> 1580: cb->is_compiled() ? cb->as_compiled_method()->method()->external_name() >> 1581: : cb->name(); >> 1582: fs.print_cr(INTPTR_FORMAT " " INTPTR_FORMAT " %s", > > Indentation isn't right. Do you mean how the arguments are lined up on the continuation line below? ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From ngasson at openjdk.java.net Mon Oct 26 06:13:40 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 26 Oct 2020 06:13:40 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v2] In-Reply-To: References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: <6N1OsuF-33tLIIy8NWmngR9cLmqAV0ID8V7tiNMdT3c=.402f474c-060a-43f0-8fa6-783ead7de234@github.com> On Thu, 22 Oct 2020 02:06:25 GMT, Yasumasa Suenaga wrote: >> Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Update for review comments > > src/hotspot/share/code/codeCache.cpp line 1571: > >> 1569: if (!fs.is_open()) { >> 1570: log_warning(codecache)("Failed to create %s for perf map", fname); >> 1571: st->print_cr("Failed to create %s", fname); > > `st->print_cr()` is still needed? and also I've intended to replace all `print_cr()` to UL (L1587) It's gone now. ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From dholmes at openjdk.java.net Mon Oct 26 06:56:34 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 26 Oct 2020 06:56:34 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v2] In-Reply-To: <9jhaf18gTpZmpz8wfp7x0lhvRl4DUOQdfbox2DzcExs=.df72b0d4-b121-482e-a6eb-214f216ac359@github.com> References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> <9jhaf18gTpZmpz8wfp7x0lhvRl4DUOQdfbox2DzcExs=.df72b0d4-b121-482e-a6eb-214f216ac359@github.com> Message-ID: <59x5Z52HdbynvbP3_FMwcYb08tGhLghAB0P3W6ALgKI=.1f1f7d99-c54f-4788-9f0e-391fb995eebb@github.com> On Mon, 26 Oct 2020 06:10:52 GMT, Nick Gasson wrote: >>> >>> 1. Like Alexey, I would really wish for an print-at-exit switch. The common naming seems to be xxxAtExit (so not, OnExit). "PrintXxx" seems to be printing stuff out to tty, "DumpXxxx" for writing separate files (e.g. CDS map). So I would name it DumpPerfMapAtExit. >>> >> >> OK, makes sense. >> >>> 2. Dumping to /tmp is unexpected for me, I would prefer if the default were dumping to the current directory. That seems to be the default for other files too (cds map, hs-err file etc). >>> >>> 3. Not necessary but nice would be a an option to specify location of the dump file. >>> >> >> The `/tmp/perf-.map` is hardcoded into perf though ([see here](https://github.com/torvalds/linux/blob/master/tools/perf/util/map.c#L155)), so I don't think it's useful for the user to be able to change the location. > >> >> I think we should use this option carefully because nmethod might be unloaded. So we should use this with `-XX:-UseCodeCacheFlushing`. >> > > Thanks for the information. `-XX:+DumpPerfMapAtExit` will turn on `UseCodeCacheFlushing` if it's set to default. I don't see any reason for this to be a product flag, rather than diagnostic. ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From shade at openjdk.java.net Mon Oct 26 07:20:48 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 26 Oct 2020 07:20:48 GMT Subject: RFR: 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle [v3] In-Reply-To: References: Message-ID: > There is a TODO item in their `get_vm_target`-s. We can clean those up. It looks to me the caller code does not handle `NULL` result well, which means we better `fatal` ourselves before exposing `NULL` to callers and `SEGV`-ing there. Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Minor stylistic nit: star leans to the left ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/825/files - new: https://git.openjdk.java.net/jdk/pull/825/files/ec242d22..74ceab1f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=825&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=825&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/825.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/825/head:pull/825 PR: https://git.openjdk.java.net/jdk/pull/825 From shade at openjdk.java.net Mon Oct 26 07:20:49 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 26 Oct 2020 07:20:49 GMT Subject: RFR: 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle [v2] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 17:02:09 GMT, Vladimir Kozlov wrote: >> Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: >> >> - Remove whitespace >> - Simplify the whole thing >> - Simplify, because MemberName::vmtarget already returns Method* > > LGTM too. I did a minor stylistic change to match `Method*` in both cases, please ack. ------------- PR: https://git.openjdk.java.net/jdk/pull/825 From shade at openjdk.java.net Mon Oct 26 07:22:36 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 26 Oct 2020 07:22:36 GMT Subject: Integrated: 8255265: IdealLoopTree::iteration_split_impl does not use should_align In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 18:00:28 GMT, Aleksey Shipilev wrote: > There is a TODO item in IdealLoopTree::iteration_split_impl: > > // TODO: Remove align -- not used. > bool should_align = policy_align(phase); > > Indeed, it is not used, because policy_align always returns FALSE. Can clean up some of the code around too. This pull request has now been integrated. Changeset: 69188188 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/69188188 Stats: 35 lines in 2 files changed: 0 ins; 25 del; 10 mod 8255265: IdealLoopTree::iteration_split_impl does not use should_align Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/815 From shade at openjdk.java.net Mon Oct 26 08:19:37 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 26 Oct 2020 08:19:37 GMT Subject: RFR: 8251994: VM crushed running TestComplexAddrExpr.java test with -XX:UseAVX=X In-Reply-To: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> References: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> Message-ID: On Mon, 26 Oct 2020 04:12:10 GMT, Vladimir Kozlov wrote: > To improve a chance to vectorize a loop, a special code in superword tries to hoist loads to the beginning of loop by replacing their memory input with corresponding (same memory slice) loop's memory Phi : > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L474 > > Originally loads are ordered by corresponding stores on the same memory slice. But when they are hoisted they loose that ordering - nothing enforce the order. > In TestComplexAddrExpr.test6 case the ordering is preserved (luckily?) after hoisting only when vector size is 32 bytes (avx2) but they become unordered with 16 (avx=0 or avx1) or 64 (avx512) bytes vectors. > > The mystery of why the test did not fail in our teting environment is also solved! We have old Skylake machines (even my local machine) for which AVX is switched to avx2: > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L721 > > I have simple fix (use original loads ordering indexes) but looking on the code which causing the issue I see that it is bogus/incomplete - it does not help cases listed for JDK-8076284 changes: > > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017645.html > > Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. Even if pack_parallel() method is able created packs they are currently all removed by filter_packs() method. > And additionally the above cases from JDK-8076284 are vectorized without hoisting loads and pack_parallel - I verified it. > > The code added by JDK-8076284 is useless now and I am excluding ti. It needs more work to be useful. I reluctant to remove the code because may be in a future we will have time to invest into it. > > There were 2 way to exclude it: add new field in superword class: > > bool _do_vector_loop; // whether to do vectorization/simd style > + bool _do_vector_loop_experimental; // experimental optimization > bool _do_reserve_copy; // do reserve copy of the graph(loop) before final modification in output > or use `#if VECTOR_LOOP_SIMD` as in current changes. I prefer this one to avoid wasting compilation time. I used first one for testing by setting `_do_vector_loop_experimental(UseNewCode)`. > > Testing: tier1-7 with avx2 and avx512. > Performance testing - no regrression. > I also compared jtreg tests output with -XX:+TraceNewVectors to verify that number of created vector nodes did not change. Drive-by comment: - synopsis should be "crashed", not "crushed"? - I personally find `VECTOR_LOOP_SIMD` more fragile than the `_do_vector_loop_experimental` field. At least the macro should also say "experimental"? src/hotspot/share/opto/superword.cpp line 1739: > 1737: tty->print_cr("packs[%d]:", i); > 1738: print_pack(p); > 1739: assert(false, "only in one pack"); This is just `fatal`, right? ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/859 From dongbo at openjdk.java.net Mon Oct 26 09:46:15 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 26 Oct 2020 09:46:15 GMT Subject: RFR: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic Message-ID: BigInteger.shiftRightImplWorker and BigInteger.shiftLeftImplWorker are not intrinsified on aarch64, which have been done on x86_64. We can implement them via USHL NEON instruction (register), which handles four integers one time at most, against just integer C2 asm-code processed. The usage of USHL can be found at: https://developer.arm.com/documentation/dui0801/g/A64-SIMD-Vector-Instructions/USHL--vector-?lang=en Patch passed jtreg tier1-3 tests on our aarch64 server. Tests in test/jdk/java/math/BigInteger/* runned specially for the correctness of the implementation and passed. We tested test/micro/org/openjdk/bench/java/math/BigIntegers.java for performance gain on Kunpeng916 and Kunpeng920. The following performance improvements were seen with this implementation: - Intrinsification of BigInteger.shiftLeft: 25.52% (Kunpeng916), 37.56% (Kunpeng920) - Intrinsification of BigInteger.shiftRight: 46.45% (Kunpeng916), 43.32% (Kunpeng920) The BigIntegers.java JMH micro-benchmark results: Benchmark Mode Cnt Score Error Units # Kunpeng 916, default BigIntegers.testAdd avgt 25 33.554 ? 0.224 ns/op BigIntegers.testHugeToString avgt 25 575.554 ? 40.656 ns/op BigIntegers.testLargeToString avgt 25 190.098 ? 0.825 ns/op **BigIntegers.testLeftShift avgt 25 1495.779 ? 12.365 ns/op** BigIntegers.testMultiply avgt 25 7551.707 ? 39.309 ns/op **BigIntegers.testRightShift avgt 25 605.302 ? 6.710 ns/op** BigIntegers.testSmallToString avgt 25 179.034 ? 0.873 ns/op # Kunpeng 916, intrinsic: BigIntegers.testAdd avgt 25 33.531 ? 0.222 ns/op BigIntegers.testHugeToString avgt 25 578.038 ? 40.675 ns/op BigIntegers.testLargeToString avgt 25 188.566 ? 0.855 ns/op **BigIntegers.testLeftShift avgt 25 1191.651 ? 20.136 ns/op** BigIntegers.testMultiply avgt 25 7492.711 ? 3.702 ns/op **BigIntegers.testRightShift avgt 25 326.891 ? 6.033 ns/op** BigIntegers.testSmallToString avgt 25 178.267 ? 1.501 ns/op # Kunpeng 920, default BigIntegers.testAdd avgt 25 22.790 ? 0.167 ns/op BigIntegers.testHugeToString avgt 25 432.428 ? 10.736 ns/op BigIntegers.testLargeToString avgt 25 121.899 ? 3.356 ns/op **BigIntegers.testLeftShift avgt 25 883.530 ? 53.714 ns/op** BigIntegers.testMultiply avgt 25 5918.845 ? 94.937 ns/op **BigIntegers.testRightShift avgt 25 329.762 ? 15.850 ns/op** BigIntegers.testSmallToString avgt 25 117.460 ? 3.040 ns/op # Kunpeng 920, intrinsic BigIntegers.testAdd avgt 25 21.791 ? 0.085 ns/op BigIntegers.testHugeToString avgt 25 415.209 ? 32.170 ns/op BigIntegers.testLargeToString avgt 25 124.635 ? 2.157 ns/op **BigIntegers.testLeftShift avgt 25 551.710 ? 7.836 ns/op** BigIntegers.testMultiply avgt 25 5869.401 ? 54.803 ns/op **BigIntegers.testRightShift avgt 25 186.896 ? 6.378 ns/op** BigIntegers.testSmallToString avgt 25 117.543 ? 3.036 ns/op ------------- Commit messages: - fix trailing whitespace - self-review: code style - fix register usage - Merge branch 'master' into aarch64_biginteger_shift - modify register usage - self-review - more comments - unify code style - roll back for short magLen - aarch64: intrisify BigInteger.shiftRightImplWorker and BigInteger.shiftLeftImplWorker with NEON instructions Changes: https://git.openjdk.java.net/jdk/pull/861/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=861&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255246 Stats: 212 lines in 2 files changed: 212 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/861.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/861/head:pull/861 PR: https://git.openjdk.java.net/jdk/pull/861 From aph at redhat.com Mon Oct 26 10:49:40 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 26 Oct 2020 10:49:40 +0000 Subject: RFR: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic In-Reply-To: References: Message-ID: On 26/10/2020 09:46, Dong Bo wrote: > The BigIntegers.java JMH micro-benchmark results: > Benchmark Mode Cnt Score Error Units I don't see any performance testing here for shifts of small BigIntegers. Can you have a look to make sure these don't regress? Thanks, -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ihse at openjdk.java.net Mon Oct 26 11:19:10 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 26 Oct 2020 11:19:10 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: <2S00ucaPGiAQLeLOejt1kfXeYEc7ctEPeRCIcq1N0N8=.dbf1ea7a-8de4-48a5-8759-03495e3e3c08@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <8Eqswd7tsVaGEXHdKDncXqKpW2tBsSeuY0PV6aTB9_c=.a6cf4957-9d31-4e89-bf44-e7b7852205d5@github.com> <2S00ucaPGiAQLeLOejt1kfXeYEc7ctEPeRCIcq1N0N8=.dbf1ea7a-8de4-48a5-8759-03495e3e3c08@github.com> Message-ID: On Thu, 8 Oct 2020 20:40:50 GMT, Xin Liu wrote: >> @navyxliu >> >>> @luhenry I tried to build it with LLVM10.0.1 >>> on my x86_64, ubuntu, I ran into a small problem. here is how I build. >>> $make ARCH=amd64 CC=/opt/llvm/bin/clang CXX=/opt/llvm/bin/clang++ LLVM=/opt/llvm/ >>> >>> I can't meet this condition because Makefile defines LIBOS_linux. >>> >>> #elif defined(LIBOS_Linux) && defined(LIBARCH_amd64) >>> return "x86_64-pc-linux-gnu"; >>> >>> Actually, Makefile assigns OS to windows/linux/aix/macosx (all lower case)and then >>> CPPFLAGS += -DLIBOS_$(OS) -DLIBOS="$(OS)" -DLIBARCH_$(LIBARCH) -DLIBARCH="$(LIBARCH)" -DLIB_EXT="$(LIB_EXT)" >> >> Interestingly, I did it this way because on my machine `LIBOS_Linux` would get defined instead of `LIBOS_linux`. I tried on WSL which might explain the difference. Could you please share more details on what environment you are using? >> >>> In hsdis.cpp, native_target_triple needs to match whatever Makefile defined. With that fix, I generate llvm version hsdis-amd64.so and it works flawlessly >> >> I'm not sure I understand what you mean. Are you saying we should define the native target triple based on the variables in the Makefile? >> >> A difficulty I ran into is that there is not always a 1-to-1 mapping between the autoconf/gcc target triple and the LLVM one. For example. you pass `x86_64-gnu-linux` to the OpenJDK's `configure` script, but the equivalent target triple for LLVM is `x86_64-pc-linux-gnu`. >> >> Since my plan isn't to use LLVM as the default for all platforms, and because there aren't that many combinations of target OS/ARCH, I am taking the approach of hardcoding the combinations we care about in `hsdis.cpp`. > >> @navyxliu >> >> > @luhenry I tried to build it with LLVM10.0.1 >> > on my x86_64, ubuntu, I ran into a small problem. here is how I build. >> > $make ARCH=amd64 CC=/opt/llvm/bin/clang CXX=/opt/llvm/bin/clang++ LLVM=/opt/llvm/ >> > I can't meet this condition because Makefile defines LIBOS_linux. >> > #elif defined(LIBOS_Linux) && defined(LIBARCH_amd64) >> > return "x86_64-pc-linux-gnu"; >> > Actually, Makefile assigns OS to windows/linux/aix/macosx (all lower case)and then >> > CPPFLAGS += -DLIBOS_$(OS) -DLIBOS="$(OS)" -DLIBARCH_$(LIBARCH) -DLIBARCH="$(LIBARCH)" -DLIB_EXT="$(LIB_EXT)" >> >> Interestingly, I did it this way because on my machine `LIBOS_Linux` would get defined instead of `LIBOS_linux`. I tried on WSL which might explain the difference. Could you please share more details on what environment you are using? >> > I am using ubuntu 18.04. > > `OS = $(shell uname)` does initialize OS=Linux in the first place, but later OS is set to "linux" at line 88 of https://openjdk.github.io/cr/?repo=jdk&pr=392&range=05#new-0 > > At line 186, -DLIBOS_linux -DLIBOS="linux" ... It doesn't match line 564 of https://openjdk.github.io/cr/?repo=jdk&pr=392&range=05#new-2 > > in my understanding, C/C++ macros are all case sensitive. I got #error "unknown platform" because of Linux/linux discrepancy. > >> > In hsdis.cpp, native_target_triple needs to match whatever Makefile defined. With that fix, I generate llvm version hsdis-amd64.so and it works flawlessly >> >> I'm not sure I understand what you mean. Are you saying we should define the native target triple based on the variables in the Makefile? >> >> A difficulty I ran into is that there is not always a 1-to-1 mapping between the autoconf/gcc target triple and the LLVM one. For example. you pass `x86_64-gnu-linux` to the OpenJDK's `configure` script, but the equivalent target triple for LLVM is `x86_64-pc-linux-gnu`. >> >> Since my plan isn't to use LLVM as the default for all platforms, and because there aren't that many combinations of target OS/ARCH, I am taking the approach of hardcoding the combinations we care about in `hsdis.cpp`. Since I found it close to impossible to review the changes when I could not get a diff with the changes done to hsdis.c/cpp, I created a webrev which shows these changes. I made this by renaming hsdis.cpp back to hsdis.c, and then webrev could match it up. It is available here: http://cr.openjdk.java.net/~ihse/hsdis-llvm-backend-diff-webrev/ ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From ihse at openjdk.java.net Mon Oct 26 11:40:11 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 26 Oct 2020 11:40:11 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <8Eqswd7tsVaGEXHdKDncXqKpW2tBsSeuY0PV6aTB9_c=.a6cf4957-9d31-4e89-bf44-e7b7852205d5@github.com> <2S00ucaPGiAQLeLOejt1kfXeYEc7ctEPeRCIcq1N0N8=.dbf1ea7a-8de4-48a5-8759-03495e3e3c08@github.com> Message-ID: On Mon, 26 Oct 2020 11:16:28 GMT, Magnus Ihse Bursie wrote: >>> @navyxliu >>> >>> > @luhenry I tried to build it with LLVM10.0.1 >>> > on my x86_64, ubuntu, I ran into a small problem. here is how I build. >>> > $make ARCH=amd64 CC=/opt/llvm/bin/clang CXX=/opt/llvm/bin/clang++ LLVM=/opt/llvm/ >>> > I can't meet this condition because Makefile defines LIBOS_linux. >>> > #elif defined(LIBOS_Linux) && defined(LIBARCH_amd64) >>> > return "x86_64-pc-linux-gnu"; >>> > Actually, Makefile assigns OS to windows/linux/aix/macosx (all lower case)and then >>> > CPPFLAGS += -DLIBOS_$(OS) -DLIBOS="$(OS)" -DLIBARCH_$(LIBARCH) -DLIBARCH="$(LIBARCH)" -DLIB_EXT="$(LIB_EXT)" >>> >>> Interestingly, I did it this way because on my machine `LIBOS_Linux` would get defined instead of `LIBOS_linux`. I tried on WSL which might explain the difference. Could you please share more details on what environment you are using? >>> >> I am using ubuntu 18.04. >> >> `OS = $(shell uname)` does initialize OS=Linux in the first place, but later OS is set to "linux" at line 88 of https://openjdk.github.io/cr/?repo=jdk&pr=392&range=05#new-0 >> >> At line 186, -DLIBOS_linux -DLIBOS="linux" ... It doesn't match line 564 of https://openjdk.github.io/cr/?repo=jdk&pr=392&range=05#new-2 >> >> in my understanding, C/C++ macros are all case sensitive. I got #error "unknown platform" because of Linux/linux discrepancy. >> >>> > In hsdis.cpp, native_target_triple needs to match whatever Makefile defined. With that fix, I generate llvm version hsdis-amd64.so and it works flawlessly >>> >>> I'm not sure I understand what you mean. Are you saying we should define the native target triple based on the variables in the Makefile? >>> >>> A difficulty I ran into is that there is not always a 1-to-1 mapping between the autoconf/gcc target triple and the LLVM one. For example. you pass `x86_64-gnu-linux` to the OpenJDK's `configure` script, but the equivalent target triple for LLVM is `x86_64-pc-linux-gnu`. >>> >>> Since my plan isn't to use LLVM as the default for all platforms, and because there aren't that many combinations of target OS/ARCH, I am taking the approach of hardcoding the combinations we care about in `hsdis.cpp`. > > Since I found it close to impossible to review the changes when I could not get a diff with the changes done to hsdis.c/cpp, I created a webrev which shows these changes. I made this by renaming hsdis.cpp back to hsdis.c, and then webrev could match it up. It is available here: > > http://cr.openjdk.java.net/~ihse/hsdis-llvm-backend-diff-webrev/ Some notes (perhaps most to myself) about how this ties into the existing hsdis implementation, and with JDK-8188073 (Capstone porting). When printing disassembly from hotspot, the current solution tries to locate and load the hsdis library, which prints disassembly using bfd. The reason for using the separate library approach is, as far as I can understand, perhaps a mix of both incompatible licensing for bfd, and a wish to not burden the jvm library with additional bloat which is needed only for debugging. The Capstone approach, in the prototype patch presented by Jorn in the issue, is to create a new capstonedis library, and dispatch to it instead of hsdis. The approach used in this patch is to refactor the existing hsdis library into an abstract base class for hsdis backends, with two concrete implementations, one for bfd and one for llvm. Unfortunately, I think the resulting code in hsdis.cpp in this patch is hard to read and understand. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From ihse at openjdk.java.net Mon Oct 26 11:44:13 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 26 Oct 2020 11:44:13 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <8Eqswd7tsVaGEXHdKDncXqKpW2tBsSeuY0PV6aTB9_c=.a6cf4957-9d31-4e89-bf44-e7b7852205d5@github.com> <2S00ucaPGiAQLeLOejt1kfXeYEc7ctEPeRCIcq1N0N8=.dbf1ea7a-8de4-48a5-8759-03495e3e3c08@github.com> Message-ID: <9oXnHULCd76_J69CKMVVZl3FfDte1pnt38y06LVV4Sg=.26a4ab2c-5ff7-4e2f-9428-0d8cd931d243@github.com> On Mon, 26 Oct 2020 11:37:52 GMT, Magnus Ihse Bursie wrote: >> Since I found it close to impossible to review the changes when I could not get a diff with the changes done to hsdis.c/cpp, I created a webrev which shows these changes. I made this by renaming hsdis.cpp back to hsdis.c, and then webrev could match it up. It is available here: >> >> http://cr.openjdk.java.net/~ihse/hsdis-llvm-backend-diff-webrev/ > > Some notes (perhaps most to myself) about how this ties into the existing hsdis implementation, and with JDK-8188073 (Capstone porting). > > When printing disassembly from hotspot, the current solution tries to locate and load the hsdis library, which prints disassembly using bfd. The reason for using the separate library approach is, as far as I can understand, perhaps a mix of both incompatible licensing for bfd, and a wish to not burden the jvm library with additional bloat which is needed only for debugging. > > The Capstone approach, in the prototype patch presented by Jorn in the issue, is to create a new capstonedis library, and dispatch to it instead of hsdis. > > The approach used in this patch is to refactor the existing hsdis library into an abstract base class for hsdis backends, with two concrete implementations, one for bfd and one for llvm. > > Unfortunately, I think the resulting code in hsdis.cpp in this patch is hard to read and understand. I think a proper solution to both this and the Capstone implementation is to provide a proper framework for selecting the hsdis backend as a first step, and refactor the existing bfd implementation as the first such backend. After that, we can add llvm and capstone as alternative hsdis backend implementations. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From chagedorn at openjdk.java.net Mon Oct 26 12:46:22 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 26 Oct 2020 12:46:22 GMT Subject: RFR: 8255245: C1: Fix output of -XX:+PrintCFGToFile to open it with visualizer [v2] In-Reply-To: References: Message-ID: <3C7jK0CaVaSKWmEKUQoSH9xGrm9JcQP-0efnki4nM6Y=.66141a70-1c23-4b2c-ac44-b0e7aac2ba0c@github.com> > [JDK-8251093](https://bugs.openjdk.java.net/browse/JDK-8251093) introduced some improved logging for intervals. When specifying `-XX:+PrintCFGToFile` to dump the graph to a file to later open it with the C1 visualizer, it also uses the improved interval printing. However, this output can no longer be read by the C1 Visualizer. As the C1 Visualizer is not part of the JDK, we should include the old format again for the output produced by `-XX:+PrintCFGToFile` to be compatible with the visualizer again. The console output can still use the improved logging of JDK-8251093. Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Move print_on definition to header file ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/837/files - new: https://git.openjdk.java.net/jdk/pull/837/files/695be546..d4d0e5ef Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=837&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=837&range=00-01 Stats: 7 lines in 2 files changed: 2 ins; 4 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/837.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/837/head:pull/837 PR: https://git.openjdk.java.net/jdk/pull/837 From chagedorn at openjdk.java.net Mon Oct 26 12:46:23 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 26 Oct 2020 12:46:23 GMT Subject: RFR: 8255245: C1: Fix output of -XX:+PrintCFGToFile to open it with visualizer [v2] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 19:35:36 GMT, Nils Eliasson wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Move print_on definition to header file > > src/hotspot/share/c1/c1_LinearScan.cpp line 4607: > >> 4605: >> 4606: #ifndef PRODUCT >> 4607: void Interval::print_on(outputStream* out) const { > > I suggest moving the impl of print_on to the declaration in the hpp-file. That will make it easier to see at a glance how the different print methods delegate to each other. That's a good idea - fixed it! ------------- PR: https://git.openjdk.java.net/jdk/pull/837 From rrich at openjdk.java.net Mon Oct 26 14:03:09 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Mon, 26 Oct 2020 14:03:09 GMT Subject: RFR: 8255349: Vector API issues on Big Endian In-Reply-To: References: Message-ID: <9A5AYVLWvxiQRZD_0uEb7YhcaZNHOuzcdRtBhxnxYCc=.71617ae3-0478-4db3-acee-7378b35f7f74@github.com> On Fri, 23 Oct 2020 15:46:07 GMT, Martin Doerr wrote: > Several jdk/incubator/vector tests are failing with stack overflow due to endless recursion on Big Endian platforms. E.g. Int64VectorLoadStoreTests (see bug for stack trace). > Endianess in defaultReinterpret is currently hard coded and not checked. > > In addition, VectorReshapeTests.java is failing due to incorrect size conversion for Big Endian in the test code (castByteArrayData). Looks good to me. Thanks! ------------- Marked as reviewed by rrich (Committer). PR: https://git.openjdk.java.net/jdk/pull/840 From thartmann at openjdk.java.net Mon Oct 26 15:16:10 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 26 Oct 2020 15:16:10 GMT Subject: RFR: 8253623: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on In-Reply-To: References: Message-ID: On Sun, 25 Oct 2020 06:43:16 GMT, Dongbo He wrote: > Backport 8253623 https://github.com/openjdk/panama-vector/pull/8 We shouldn't do "backports" from project specific issues to mainline. Please file a separate issue for this. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/853 From mdoerr at openjdk.java.net Mon Oct 26 15:35:13 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 26 Oct 2020 15:35:13 GMT Subject: RFR: 8255349: Vector API issues on Big Endian In-Reply-To: <9A5AYVLWvxiQRZD_0uEb7YhcaZNHOuzcdRtBhxnxYCc=.71617ae3-0478-4db3-acee-7378b35f7f74@github.com> References: <9A5AYVLWvxiQRZD_0uEb7YhcaZNHOuzcdRtBhxnxYCc=.71617ae3-0478-4db3-acee-7378b35f7f74@github.com> Message-ID: On Mon, 26 Oct 2020 14:00:05 GMT, Richard Reingruber wrote: >> Several jdk/incubator/vector tests are failing with stack overflow due to endless recursion on Big Endian platforms. E.g. Int64VectorLoadStoreTests (see bug for stack trace). >> Endianess in defaultReinterpret is currently hard coded and not checked. >> >> In addition, VectorReshapeTests.java is failing due to incorrect size conversion for Big Endian in the test code (castByteArrayData). > > Looks good to me. Thanks! Thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/840 From mdoerr at openjdk.java.net Mon Oct 26 15:35:15 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 26 Oct 2020 15:35:15 GMT Subject: Integrated: 8255349: Vector API issues on Big Endian In-Reply-To: References: Message-ID: <3Ph18rG44W3CH-6YFE0gSKI2bItpRzbve0vLkn8zXIo=.be6b2272-4623-4965-aea8-dde882eafb42@github.com> On Fri, 23 Oct 2020 15:46:07 GMT, Martin Doerr wrote: > Several jdk/incubator/vector tests are failing with stack overflow due to endless recursion on Big Endian platforms. E.g. Int64VectorLoadStoreTests (see bug for stack trace). > Endianess in defaultReinterpret is currently hard coded and not checked. > > In addition, VectorReshapeTests.java is failing due to incorrect size conversion for Big Endian in the test code (castByteArrayData). This pull request has now been integrated. Changeset: 9b5a2a6b Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/9b5a2a6b Stats: 30 lines in 2 files changed: 23 ins; 2 del; 5 mod 8255349: Vector API issues on Big Endian Reviewed-by: psandoz, rrich ------------- PR: https://git.openjdk.java.net/jdk/pull/840 From kvn at openjdk.java.net Mon Oct 26 15:48:29 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 26 Oct 2020 15:48:29 GMT Subject: RFR: 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle [v3] In-Reply-To: References: Message-ID: <5syNswkqgDl-azo9gUZrYPqjNnV9RfP13PzuCuxcN_I=.02c4a962-f9a8-43c1-921f-fb98769629fe@github.com> On Mon, 26 Oct 2020 07:20:48 GMT, Aleksey Shipilev wrote: >> There is a TODO item in their `get_vm_target`-s. We can clean those up. It looks to me the caller code does not handle `NULL` result well, which means we better `fatal` ourselves before exposing `NULL` to callers and `SEGV`-ing there. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Minor stylistic nit: star leans to the left okay ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/825 From shade at openjdk.java.net Mon Oct 26 15:48:48 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 26 Oct 2020 15:48:48 GMT Subject: RFR: 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle [v3] In-Reply-To: <5syNswkqgDl-azo9gUZrYPqjNnV9RfP13PzuCuxcN_I=.02c4a962-f9a8-43c1-921f-fb98769629fe@github.com> References: <5syNswkqgDl-azo9gUZrYPqjNnV9RfP13PzuCuxcN_I=.02c4a962-f9a8-43c1-921f-fb98769629fe@github.com> Message-ID: On Mon, 26 Oct 2020 15:39:07 GMT, Vladimir Kozlov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor stylistic nit: star leans to the left > > okay Thanks, @kvn! ------------- PR: https://git.openjdk.java.net/jdk/pull/825 From shade at openjdk.java.net Mon Oct 26 15:49:07 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 26 Oct 2020 15:49:07 GMT Subject: Integrated: 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 08:27:36 GMT, Aleksey Shipilev wrote: > There is a TODO item in their `get_vm_target`-s. We can clean those up. It looks to me the caller code does not handle `NULL` result well, which means we better `fatal` ourselves before exposing `NULL` to callers and `SEGV`-ing there. This pull request has now been integrated. Changeset: fa64477c Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/fa64477c Stats: 14 lines in 2 files changed: 0 ins; 10 del; 4 mod 8255301: Common and strengthen the code in ciMemberName and ciMethodHandle Reviewed-by: vlivanov, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/825 From kvn at openjdk.java.net Mon Oct 26 15:51:11 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 26 Oct 2020 15:51:11 GMT Subject: RFR: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X In-Reply-To: References: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> Message-ID: <90f0BRg7s3OB9LRnzZt7qsKOu80Vu9ZFYkPeORTqJr0=.bc7b8355-f627-44e5-9ee5-fda08cf9ed4f@github.com> On Mon, 26 Oct 2020 08:13:41 GMT, Aleksey Shipilev wrote: >> To improve a chance to vectorize a loop, a special code in superword tries to hoist loads to the beginning of loop by replacing their memory input with corresponding (same memory slice) loop's memory Phi : >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L474 >> >> Originally loads are ordered by corresponding stores on the same memory slice. But when they are hoisted they loose that ordering - nothing enforce the order. >> In TestComplexAddrExpr.test6 case the ordering is preserved (luckily?) after hoisting only when vector size is 32 bytes (avx2) but they become unordered with 16 (avx=0 or avx1) or 64 (avx512) bytes vectors. >> >> The mystery of why the test did not fail in our teting environment is also solved! We have old Skylake machines (even my local machine) for which AVX is switched to avx2: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L721 >> >> I have simple fix (use original loads ordering indexes) but looking on the code which causing the issue I see that it is bogus/incomplete - it does not help cases listed for JDK-8076284 changes: >> >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017645.html >> >> Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. Even if pack_parallel() method is able created packs they are currently all removed by filter_packs() method. >> And additionally the above cases from JDK-8076284 are vectorized without hoisting loads and pack_parallel - I verified it. >> >> The code added by JDK-8076284 is useless now and I am excluding ti. It needs more work to be useful. I reluctant to remove the code because may be in a future we will have time to invest into it. >> >> There were 2 way to exclude it: add new field in superword class: >> >> bool _do_vector_loop; // whether to do vectorization/simd style >> + bool _do_vector_loop_experimental; // experimental optimization >> bool _do_reserve_copy; // do reserve copy of the graph(loop) before final modification in output >> or use `#if VECTOR_LOOP_SIMD` as in current changes. I prefer this one to avoid wasting compilation time. I used first one for testing by setting `_do_vector_loop_experimental(UseNewCode)`. >> >> Testing: tier1-7 with avx2 and avx512. >> Performance testing - no regrression. >> I also compared jtreg tests output with -XX:+TraceNewVectors to verify that number of created vector nodes did not change. > > src/hotspot/share/opto/superword.cpp line 1739: > >> 1737: tty->print_cr("packs[%d]:", i); >> 1738: print_pack(p); >> 1739: assert(false, "only in one pack"); > > This is just `fatal`, right? it is assert because this block of code is under `#ifdef ASSERT` ------------- PR: https://git.openjdk.java.net/jdk/pull/859 From kvn at openjdk.java.net Mon Oct 26 15:54:22 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 26 Oct 2020 15:54:22 GMT Subject: RFR: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X In-Reply-To: References: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> Message-ID: On Mon, 26 Oct 2020 08:16:26 GMT, Aleksey Shipilev wrote: > Drive-by comment: > > * synopsis should be "crashed", not "crushed"? Fixed. > * I personally find `VECTOR_LOOP_SIMD` more fragile than the `_do_vector_loop_experimental` field. At least the macro should also say "experimental"? 'fragile' is in naming sense to use `DO_VECTOR_LOOP_EXPERIMENTAL`, for example? Or you prefer to have the code be guarded by `if(_do_vector_loop_experimental)` runtime check instead of `#if DO_VECTOR_LOOP_EXPERIMENTAL` macro check? ------------- PR: https://git.openjdk.java.net/jdk/pull/859 From shade at openjdk.java.net Mon Oct 26 16:00:17 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 26 Oct 2020 16:00:17 GMT Subject: RFR: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X In-Reply-To: References: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> Message-ID: On Mon, 26 Oct 2020 15:51:36 GMT, Vladimir Kozlov wrote: > 'fragile' is in naming sense to use `DO_VECTOR_LOOP_EXPERIMENTAL`, for example? This makes most sense, I believe. > Or you prefer to have the code be guarded by `if(_do_vector_loop_experimental)` runtime check instead of `#if DO_VECTOR_LOOP_EXPERIMENTAL` macro check? Yeah, I am not a big fan of doing macros when runtime checks carry the same weight. I understand you want to micro-optimize these paths with macros, but I think the runtime checks work without much penalty here? I would defer to compiler reviewers to say which one is better. ------------- PR: https://git.openjdk.java.net/jdk/pull/859 From redestad at openjdk.java.net Mon Oct 26 16:10:19 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 26 Oct 2020 16:10:19 GMT Subject: RFR: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X In-Reply-To: References: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> Message-ID: On Mon, 26 Oct 2020 15:57:43 GMT, Aleksey Shipilev wrote: >>> Drive-by comment: >>> >>> * synopsis should be "crashed", not "crushed"? >> >> Fixed. >> >>> * I personally find `VECTOR_LOOP_SIMD` more fragile than the `_do_vector_loop_experimental` field. At least the macro should also say "experimental"? >> >> 'fragile' is in naming sense to use `DO_VECTOR_LOOP_EXPERIMENTAL`, for example? >> Or you prefer to have the code be guarded by `if(_do_vector_loop_experimental)` runtime check instead of `#if DO_VECTOR_LOOP_EXPERIMENTAL` macro check? > >> 'fragile' is in naming sense to use `DO_VECTOR_LOOP_EXPERIMENTAL`, for example? > > This makes most sense, I believe. > >> Or you prefer to have the code be guarded by `if(_do_vector_loop_experimental)` runtime check instead of `#if DO_VECTOR_LOOP_EXPERIMENTAL` macro check? > > Yeah, I am not a big fan of doing macros when runtime checks carry the same weight. I understand you want to micro-optimize these paths with macros, but I think the runtime checks work without much penalty here? I would defer to compiler reviewers to say which one is better. Wouldn't something like a `const bool _do_vector_loop_experimental = false;` eliminate the disabled code anyway? ------------- PR: https://git.openjdk.java.net/jdk/pull/859 From kvn at openjdk.java.net Mon Oct 26 16:24:19 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 26 Oct 2020 16:24:19 GMT Subject: RFR: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X In-Reply-To: References: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> Message-ID: On Mon, 26 Oct 2020 16:07:09 GMT, Claes Redestad wrote: > Wouldn't something like a `const bool _do_vector_loop_experimental = false;` eliminate the disabled code anyway? Good idea. I assume you mean `static const bool`. I will update changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/859 From vladimir.x.ivanov at oracle.com Mon Oct 26 16:56:53 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 26 Oct 2020 19:56:53 +0300 Subject: RFR: 8255379: ProblemList compiler/loopstripmining/BackedgeNodeWithOutOfLoopControl.java In-Reply-To: <5iov9or-EITwvUpIV-nq3Jc_BI1zbj5RpQEB0RGCg2A=.48061cf4-9a52-4cd8-836c-7d464ea4d4b5@github.com> References: <5iov9or-EITwvUpIV-nq3Jc_BI1zbj5RpQEB0RGCg2A=.48061cf4-9a52-4cd8-836c-7d464ea4d4b5@github.com> Message-ID: Thanks for taking care of it, Dan! Best regards, Vladimir Ivanov On 25.10.2020 17:48, Daniel D.Daugherty wrote: > A trivial fix to ProblemList compiler/loopstripmining/BackedgeNodeWithOutOfLoopControl.java > in order to reduce the noise in the JDK16 CI. > > ------------- > > Commit messages: > - 8255379: ProblemList compiler/loopstripmining/BackedgeNodeWithOutOfLoopControl.java > > Changes: https://git.openjdk.java.net/jdk/pull/858/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=858&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8255379 > Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod > Patch: https://git.openjdk.java.net/jdk/pull/858.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/858/head:pull/858 > > PR: https://git.openjdk.java.net/jdk/pull/858 > From vlivanov at openjdk.java.net Mon Oct 26 17:00:18 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 26 Oct 2020 17:00:18 GMT Subject: RFR: 8255378: [Vector API] Remove redundant vector length check after JDK-8254814 and JDK-8255210 In-Reply-To: References: Message-ID: <4rggcT9dE4sNRGPJmnwR-vOmiPYjWK6_LqMyqGtJ2x0=.5b9b65cf-4fb2-492a-90a8-628761053b5e@github.com> On Sun, 25 Oct 2020 08:33:26 GMT, Jie Fu wrote: > Hi all, > > As @iwanowww pointed out [1] that there are redundant vector length checks for reductionI [2] and reductionS [3]. > It would be better to remove them. > > Testing: > - jdk/incubator/vector on both AVX512 and AVX256 machines > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/pull/791#discussion_r510687005 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L4429 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L4625 Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/854 From vlivanov at openjdk.java.net Mon Oct 26 17:16:21 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 26 Oct 2020 17:16:21 GMT Subject: RFR: 8253623: Fastdebug JVM crashes with Vector API when PrintAssembly is turned on In-Reply-To: References: Message-ID: On Sun, 25 Oct 2020 06:43:16 GMT, Dongbo He wrote: > Backport 8253623 https://github.com/openjdk/panama-vector/pull/8 src/hotspot/share/opto/callnode.cpp line 492: > 490: if (iklass != NULL) { > 491: st->print(" ["); > 492: iklass->nof_nonstatic_fields(); // FIXME: iklass->_nonstatic_fields == NULL It was intended as a quick hack and `JVMState::format()` is not the right place to solve the problem. I suggest to look at `PhaseVector::scalarize_vbox_node`. Here are some pointers on how it is done in other places: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macro.cpp#L740 https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/memnode.cpp#L1320 ------------- PR: https://git.openjdk.java.net/jdk/pull/853 From vlivanov at openjdk.java.net Mon Oct 26 17:30:23 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 26 Oct 2020 17:30:23 GMT Subject: RFR: 8253734: C2: Optimize Move nodes [v2] In-Reply-To: References: Message-ID: On Sat, 24 Oct 2020 04:43:07 GMT, blaquez wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments from Tobias. > > Marked as reviewed by blaquez at github.com (no known OpenJDK username). Thanks for the reviews, Vladimir, Tobias, and Nils. ------------- PR: https://git.openjdk.java.net/jdk/pull/826 From vlivanov at openjdk.java.net Mon Oct 26 17:30:25 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 26 Oct 2020 17:30:25 GMT Subject: Integrated: 8253734: C2: Optimize Move nodes In-Reply-To: References: Message-ID: <8ni1iYHuz17-_QMdL6X6KdS6quZBUxWwMQHD8YskOLc=.abbd49cb-9f0d-44fb-9248-031afbad5f28@github.com> On Fri, 23 Oct 2020 08:27:50 GMT, Vladimir Ivanov wrote: > Introduce the following transformations for Move nodes: > 1. `MoveI2F (MoveF2I x) => x` > > 1. `MoveI2F (LoadI mem) => LoadF mem` > > 1. `StoreI mem (MoveF2I x) => StoreF mem x` > > (The same applies to MoveL2D/MoveD2L.) > > ?1 eliminates redundant operations and ?2/?3 avoid reg-to-reg moves in generated code: > 0x000000010d09964c: vmovss 0x20(%rsi),%xmm1 > 0x000000010d099651: vmovd %xmm1,%eax ;*invokestatic floatToRawIntBits > vs > 0x0000000110c5a6cc: mov 0x20(%rsi),%eax ;*invokestatic floatToRawIntBits > > > (?2 and ?3 are performed late (after loop opts are over) to avoid high-level optimizations passes to handle newly introduced mismatched accesses.) > > Testing: tier1-5. This pull request has now been integrated. Changeset: 83a91bfa Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/83a91bfa Stats: 132 lines in 5 files changed: 117 ins; 0 del; 15 mod 8253734: C2: Optimize Move nodes Reviewed-by: thartmann, neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/826 From kvn at openjdk.java.net Mon Oct 26 17:40:40 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 26 Oct 2020 17:40:40 GMT Subject: RFR: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X [v2] In-Reply-To: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> References: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> Message-ID: > To improve a chance to vectorize a loop, a special code in superword tries to hoist loads to the beginning of loop by replacing their memory input with corresponding (same memory slice) loop's memory Phi : > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L474 > > Originally loads are ordered by corresponding stores on the same memory slice. But when they are hoisted they loose that ordering - nothing enforce the order. > In TestComplexAddrExpr.test6 case the ordering is preserved (luckily?) after hoisting only when vector size is 32 bytes (avx2) but they become unordered with 16 (avx=0 or avx1) or 64 (avx512) bytes vectors. > > The mystery of why the test did not fail in our teting environment is also solved! We have old Skylake machines (even my local machine) for which AVX is switched to avx2: > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L721 > > I have simple fix (use original loads ordering indexes) but looking on the code which causing the issue I see that it is bogus/incomplete - it does not help cases listed for JDK-8076284 changes: > > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017645.html > > Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. Even if pack_parallel() method is able created packs they are currently all removed by filter_packs() method. > And additionally the above cases from JDK-8076284 are vectorized without hoisting loads and pack_parallel - I verified it. > > The code added by JDK-8076284 is useless now and I am excluding ti. It needs more work to be useful. I reluctant to remove the code because may be in a future we will have time to invest into it. > > There were 2 way to exclude it: add new field in superword class: > > bool _do_vector_loop; // whether to do vectorization/simd style > + bool _do_vector_loop_experimental; // experimental optimization > bool _do_reserve_copy; // do reserve copy of the graph(loop) before final modification in output > or use `#if VECTOR_LOOP_SIMD` as in current changes. I prefer this one to avoid wasting compilation time. I used first one for testing by setting `_do_vector_loop_experimental(UseNewCode)`. > > Testing: tier1-7 with avx2 and avx512. > Performance testing - no regrression. > I also compared jtreg tests output with -XX:+TraceNewVectors to verify that number of created vector nodes did not change. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Use static const bool local variable ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/859/files - new: https://git.openjdk.java.net/jdk/pull/859/files/150f91c9..325f857b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=859&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=859&range=00-01 Stats: 10 lines in 1 file changed: 0 ins; 6 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/859.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/859/head:pull/859 PR: https://git.openjdk.java.net/jdk/pull/859 From shade at openjdk.java.net Mon Oct 26 17:44:21 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 26 Oct 2020 17:44:21 GMT Subject: RFR: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X [v2] In-Reply-To: References: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> Message-ID: On Mon, 26 Oct 2020 17:40:40 GMT, Vladimir Kozlov wrote: >> To improve a chance to vectorize a loop, a special code in superword tries to hoist loads to the beginning of loop by replacing their memory input with corresponding (same memory slice) loop's memory Phi : >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L474 >> >> Originally loads are ordered by corresponding stores on the same memory slice. But when they are hoisted they loose that ordering - nothing enforce the order. >> In TestComplexAddrExpr.test6 case the ordering is preserved (luckily?) after hoisting only when vector size is 32 bytes (avx2) but they become unordered with 16 (avx=0 or avx1) or 64 (avx512) bytes vectors. >> >> The mystery of why the test did not fail in our teting environment is also solved! We have old Skylake machines (even my local machine) for which AVX is switched to avx2: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L721 >> >> I have simple fix (use original loads ordering indexes) but looking on the code which causing the issue I see that it is bogus/incomplete - it does not help cases listed for JDK-8076284 changes: >> >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017645.html >> >> Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. Even if pack_parallel() method is able created packs they are currently all removed by filter_packs() method. >> And additionally the above cases from JDK-8076284 are vectorized without hoisting loads and pack_parallel - I verified it. >> >> The code added by JDK-8076284 is useless now and I am excluding ti. It needs more work to be useful. I reluctant to remove the code because may be in a future we will have time to invest into it. >> >> There were 2 way to exclude it: add new field in superword class: >> >> bool _do_vector_loop; // whether to do vectorization/simd style >> + bool _do_vector_loop_experimental; // experimental optimization >> bool _do_reserve_copy; // do reserve copy of the graph(loop) before final modification in output >> or use `#if VECTOR_LOOP_SIMD` as in current changes. I prefer this one to avoid wasting compilation time. I used first one for testing by setting `_do_vector_loop_experimental(UseNewCode)`. >> >> Testing: tier1-7 with avx2 and avx512. >> Performance testing - no regrression. >> I also compared jtreg tests output with -XX:+TraceNewVectors to verify that number of created vector nodes did not change. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Use static const bool local variable That is much cleaner, thanks. Looks good to me. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/859 From redestad at openjdk.java.net Mon Oct 26 18:33:23 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 26 Oct 2020 18:33:23 GMT Subject: RFR: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X [v2] In-Reply-To: References: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> Message-ID: <4k5yPXxTawRebxGcJ4K5aUF9X2ogSwNDT6NDB7eEvZk=.c302469f-f92d-4c37-95d7-8d2c132dd8d0@github.com> On Mon, 26 Oct 2020 17:40:40 GMT, Vladimir Kozlov wrote: >> To improve a chance to vectorize a loop, a special code in superword tries to hoist loads to the beginning of loop by replacing their memory input with corresponding (same memory slice) loop's memory Phi : >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L474 >> >> Originally loads are ordered by corresponding stores on the same memory slice. But when they are hoisted they loose that ordering - nothing enforce the order. >> In TestComplexAddrExpr.test6 case the ordering is preserved (luckily?) after hoisting only when vector size is 32 bytes (avx2) but they become unordered with 16 (avx=0 or avx1) or 64 (avx512) bytes vectors. >> >> The mystery of why the test did not fail in our teting environment is also solved! We have old Skylake machines (even my local machine) for which AVX is switched to avx2: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L721 >> >> I have simple fix (use original loads ordering indexes) but looking on the code which causing the issue I see that it is bogus/incomplete - it does not help cases listed for JDK-8076284 changes: >> >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017645.html >> >> Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. Even if pack_parallel() method is able created packs they are currently all removed by filter_packs() method. >> And additionally the above cases from JDK-8076284 are vectorized without hoisting loads and pack_parallel - I verified it. >> >> The code added by JDK-8076284 is useless now and I am excluding ti. It needs more work to be useful. I reluctant to remove the code because may be in a future we will have time to invest into it. >> >> There were 2 way to exclude it: add new field in superword class: >> >> bool _do_vector_loop; // whether to do vectorization/simd style >> + bool _do_vector_loop_experimental; // experimental optimization >> bool _do_reserve_copy; // do reserve copy of the graph(loop) before final modification in output >> or use `#if VECTOR_LOOP_SIMD` as in current changes. I prefer this one to avoid wasting compilation time. I used first one for testing by setting `_do_vector_loop_experimental(UseNewCode)`. >> >> Testing: tier1-7 with avx2 and avx512. >> Performance testing - no regrression. >> I also compared jtreg tests output with -XX:+TraceNewVectors to verify that number of created vector nodes did not change. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Use static const bool local variable Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/859 From kvn at openjdk.java.net Mon Oct 26 19:45:24 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 26 Oct 2020 19:45:24 GMT Subject: RFR: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X [v2] In-Reply-To: <4k5yPXxTawRebxGcJ4K5aUF9X2ogSwNDT6NDB7eEvZk=.c302469f-f92d-4c37-95d7-8d2c132dd8d0@github.com> References: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> <4k5yPXxTawRebxGcJ4K5aUF9X2ogSwNDT6NDB7eEvZk=.c302469f-f92d-4c37-95d7-8d2c132dd8d0@github.com> Message-ID: On Mon, 26 Oct 2020 18:30:44 GMT, Claes Redestad wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Use static const bool local variable > > Marked as reviewed by redestad (Reviewer). Thank you Aleksey and Claes for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/859 From kvn at openjdk.java.net Mon Oct 26 19:45:28 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 26 Oct 2020 19:45:28 GMT Subject: Integrated: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X In-Reply-To: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> References: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> Message-ID: <6x5CdwoM0F3mRs5naBkgYxD8CbuQOwll13AREnBj4VQ=.c6e11b29-7679-4f4f-86dd-050f17738f66@github.com> On Mon, 26 Oct 2020 04:12:10 GMT, Vladimir Kozlov wrote: > To improve a chance to vectorize a loop, a special code in superword tries to hoist loads to the beginning of loop by replacing their memory input with corresponding (same memory slice) loop's memory Phi : > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L474 > > Originally loads are ordered by corresponding stores on the same memory slice. But when they are hoisted they loose that ordering - nothing enforce the order. > In TestComplexAddrExpr.test6 case the ordering is preserved (luckily?) after hoisting only when vector size is 32 bytes (avx2) but they become unordered with 16 (avx=0 or avx1) or 64 (avx512) bytes vectors. > > The mystery of why the test did not fail in our teting environment is also solved! We have old Skylake machines (even my local machine) for which AVX is switched to avx2: > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L721 > > I have simple fix (use original loads ordering indexes) but looking on the code which causing the issue I see that it is bogus/incomplete - it does not help cases listed for JDK-8076284 changes: > > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017645.html > > Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. Even if pack_parallel() method is able created packs they are currently all removed by filter_packs() method. > And additionally the above cases from JDK-8076284 are vectorized without hoisting loads and pack_parallel - I verified it. > > The code added by JDK-8076284 is useless now and I am excluding ti. It needs more work to be useful. I reluctant to remove the code because may be in a future we will have time to invest into it. > > There were 2 way to exclude it: add new field in superword class: > > bool _do_vector_loop; // whether to do vectorization/simd style > + bool _do_vector_loop_experimental; // experimental optimization > bool _do_reserve_copy; // do reserve copy of the graph(loop) before final modification in output > or use `#if VECTOR_LOOP_SIMD` as in current changes. I prefer this one to avoid wasting compilation time. I used first one for testing by setting `_do_vector_loop_experimental(UseNewCode)`. > > Testing: tier1-7 with avx2 and avx512. > Performance testing - no regrression. > I also compared jtreg tests output with -XX:+TraceNewVectors to verify that number of created vector nodes did not change. This pull request has now been integrated. Changeset: a7fa1b70 Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/a7fa1b70 Stats: 234 lines in 3 files changed: 227 ins; 0 del; 7 mod 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X Reviewed-by: shade, redestad ------------- PR: https://git.openjdk.java.net/jdk/pull/859 From xliu at openjdk.java.net Mon Oct 26 22:33:19 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 26 Oct 2020 22:33:19 GMT Subject: RFR: 8255245: C1: Fix output of -XX:+PrintCFGToFile to open it with visualizer [v2] In-Reply-To: <3C7jK0CaVaSKWmEKUQoSH9xGrm9JcQP-0efnki4nM6Y=.66141a70-1c23-4b2c-ac44-b0e7aac2ba0c@github.com> References: <3C7jK0CaVaSKWmEKUQoSH9xGrm9JcQP-0efnki4nM6Y=.66141a70-1c23-4b2c-ac44-b0e7aac2ba0c@github.com> Message-ID: On Mon, 26 Oct 2020 12:46:22 GMT, Christian Hagedorn wrote: >> [JDK-8251093](https://bugs.openjdk.java.net/browse/JDK-8251093) introduced some improved logging for intervals. When specifying `-XX:+PrintCFGToFile` to dump the graph to a file to later open it with the C1 visualizer, it also uses the improved interval printing. However, this output can no longer be read by the C1 Visualizer. As the C1 Visualizer is not part of the JDK, we should include the old format again for the output produced by `-XX:+PrintCFGToFile` to be compatible with the visualizer again. The console output can still use the improved logging of JDK-8251093. > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Move print_on definition to header file Marked as reviewed by xliu (no project role). src/hotspot/share/c1/c1_LinearScan.hpp line 641: > 639: } > 640: // Special version for compatibility with C1 Visualizer. > 641: void print_on(outputStream* out, bool is_cfg_printer) const; then why not just print_on(outputStream* out, bool is_cfg_printer=false) const directly? those 3 lines can be further saved. :) ------------- PR: https://git.openjdk.java.net/jdk/pull/837 From xliu at openjdk.java.net Mon Oct 26 22:41:24 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 26 Oct 2020 22:41:24 GMT Subject: RFR: 8241495: Make more compiler related flags available on a per method level [v2] In-Reply-To: References: Message-ID: > 8241495: Make more compiler related flags available on a per method level Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8241495: Make more compiler related flags available on a per method level ------------- Changes: https://git.openjdk.java.net/jdk/pull/796/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=796&range=01 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/796.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/796/head:pull/796 PR: https://git.openjdk.java.net/jdk/pull/796 From xliu at openjdk.java.net Mon Oct 26 23:02:19 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 26 Oct 2020 23:02:19 GMT Subject: RFR: 8241495: Make more compiler related flags available on a per method level [v2] In-Reply-To: <_CXcBL6MckmEHVOCz5R6laZGzyknrIo9Y6jQ8y_LP5o=.9abf2f24-4c93-4e38-8bdd-f0ec86f153d4@github.com> References: <_CXcBL6MckmEHVOCz5R6laZGzyknrIo9Y6jQ8y_LP5o=.9abf2f24-4c93-4e38-8bdd-f0ec86f153d4@github.com> Message-ID: On Thu, 22 Oct 2020 09:06:47 GMT, Nils Eliasson wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8241495: Make more compiler related flags available on a per method level > > src/hotspot/share/compiler/compilerDirectives.hpp line 39: > >> 37: cflags(Enable, bool, false, Enable) \ >> 38: cflags(Exclude, bool, false, Exclude) \ >> 39: cflags(BreakAtExecute, bool, false, BreakAtExecute) \ > > The BreakAtFlags are missing defaults since CompileCommand uses the "break" option. If we are going to add them to the directives - we should go through all uses and make sure that this is the only flag is uses. break can't distinct breatAtCompile or breatAtExecution. I think it's good idea to separate them. Debugging the parser or optimizations needs the first. The developers who debug codegen need the second one. Currently, directive->breakAtCompile and directive->breakAtExecution are in use. Even BreakCommand is translated into them. There's a cleanup JDK-8255216. that's another task I will follow. ------------- PR: https://git.openjdk.java.net/jdk/pull/796 From sspitsyn at openjdk.java.net Tue Oct 27 00:55:21 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 27 Oct 2020 00:55:21 GMT Subject: RFR: 8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown [v3] In-Reply-To: References: Message-ID: On Thu, 22 Oct 2020 20:35:29 GMT, Richard Reingruber wrote: >> The following test cases try to provoke VMOutOfMemoryException during object reallocation because of JVMTI PopFrame / ForceEarlyReturn: >> >> EAPopFrameNotInlinedReallocFailure >> EAPopInlinedMethodWithScalarReplacedObjectsReallocFailure >> EAForceEarlyReturnOfInlinedMethodWithScalarReplacedObjectsReallocFailure >> >> For ZGC (so far) this is not 100% reliable. >> >> Just ignoring the runs where the expected OOME was not raised was not accepted. >> >> Summary of the now accepted solution: >> >> - The 3 problematic test cases are skipped if ZGC is selected. >> >> - They are also skipped if no OOME during object reallocation can be expected because allocations are not eliminated. >> >> - In consumeAllMemory, as a last step, empty LinkedList nodes are created without long array to fill up small blocks of free memory. >> >> - EATests.java is removed from the problem list for ZGC. > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Skip test cases expecting OOMEs if running with ZGC. > - Merge branch 'master' into JDK-8255072-eatests-oom-not-thrown > - Make OOME more reliable and skip test cases if it is not expected because scalar replacement is disabled > - 8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown Hi Richard, It looks good to me. One nit: public static final boolean DoEscapeAnalysis = unbox(WB.getBooleanVMFlag("DoEscapeAnalysis"), UseJVMCICompiler); public static final boolean EliminateAllocations = unbox(WB.getBooleanVMFlag("EliminateAllocations"), UseJVMCICompiler); // read by debugger public static final boolean DeoptimizeObjectsALot = WB.getBooleanVMFlag("DeoptimizeObjectsALot"); // read by debugger public static final long BiasedLockingBulkRebiasThreshold = WB.getIntxVMFlag("BiasedLockingBulkRebiasThreshold"); public static final long BiasedLockingBulkRevokeThreshold = WB.getIntxVMFlag("BiasedLockingBulkRevokeThreshold"); + public static final boolean ZGCIsSelected = GC.Z.isSelected(); There are unneeded spaces before '=' sign. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/775 From sspitsyn at openjdk.java.net Tue Oct 27 01:09:23 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 27 Oct 2020 01:09:23 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v4] In-Reply-To: References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: On Mon, 26 Oct 2020 06:06:47 GMT, Nick Gasson wrote: >> When using the Linux "perf" tool to do system profiling, symbol names of >> running Java methods cannot be decoded, resulting in unhelpful output >> such as: >> >> 10.52% [JIT] tid 236748 [.] 0x00007f6fdb75d223 >> >> Perf can read a simple text file format describing the mapping between >> address ranges and symbol names for a particular process [1]. >> >> It's possible to generate this already for Java processes using a JVMTI >> plugin such as perf-map-agent [2]. However this requires compiling >> third-party code and then loading the agent into your Java process. It >> would be more convenient if Hotspot could write this file directly using >> a diagnostic command. The information required is almost identical to >> that of the existing Compiler.codelist command. >> >> This patch adds a Compiler.perfmap diagnostic command on Linux only. To >> use, first run "jcmd Compiler.perfmap" and then "perf top" or >> "perf record" and the report should show decoded Java symbol names for >> that process. >> >> As this just writes a snapshot of the code cache when the command is >> run, it will become stale if methods are compiled later or unloaded. >> However this shouldn't be a big problem in practice if the map file is >> generated after the application has warmed up. >> >> [1] https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt >> [2] https://github.com/jvm-profiling-tools/perf-map-agent > > Nick Gasson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge master > - Add -XX:+DumpPerfMapAtExit option > - Update for review comments > - 8254723: add diagnostic command to write Linux perf map file > > When using the Linux "perf" tool to do system profiling, symbol names of > running Java methods cannot be decoded, resulting in unhelpful output > such as: > > 10.52% [JIT] tid 236748 [.] 0x00007f6fdb75d223 > > Perf can read a simple text file format describing the mapping between > address ranges and symbol names for a particular process [1]. > > It's possible to generate this already for Java processes using a JVMTI > plugin such as perf-map-agent [2]. However this requires compiling > third-party code and then loading the agent into your Java process. It > would be more convenient if Hotspot could write this file directly using > a diagnostic command. The information required is almost identical to > that of the existing Compiler.codelist command. > > This patch adds a Compiler.perfmap diagnostic command on Linux only. To > use, first run "jcmd Compiler.perfmap" and then "perf top" or > "perf record" and the report should show decoded Java symbol names for > that process. > > As this just writes a snapshot of the code cache when the command is > run, it will become stale if methods are compiled later or unloaded. > However this shouldn't be a big problem in practice if the map file is > generated after the application has warmed up. > > [1] https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt > [2] https://github.com/jvm-profiling-tools/perf-map-agent Hi Nick, This looks good. Thank you for the update. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/760 From xliu at openjdk.java.net Tue Oct 27 03:04:22 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 27 Oct 2020 03:04:22 GMT Subject: RFR: 8241495: Make more compiler related flags available on a per method level [v2] In-Reply-To: <_CXcBL6MckmEHVOCz5R6laZGzyknrIo9Y6jQ8y_LP5o=.9abf2f24-4c93-4e38-8bdd-f0ec86f153d4@github.com> References: <_CXcBL6MckmEHVOCz5R6laZGzyknrIo9Y6jQ8y_LP5o=.9abf2f24-4c93-4e38-8bdd-f0ec86f153d4@github.com> Message-ID: <7aTnh5HM-6OhoueuggXiCdcRnmSlRL_qNmc2BB6JO2U=.472e79b4-c72c-4cae-8678-d2fb8194540f@github.com> On Thu, 22 Oct 2020 09:04:43 GMT, Nils Eliasson wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8241495: Make more compiler related flags available on a per method level > > src/hotspot/share/compiler/compilerDirectives.hpp line 37: > >> 35: // Directives flag name, type, default value, compile command name >> 36: #define compilerdirectives_common_flags(cflags) \ >> 37: cflags(Enable, bool, false, Enable) \ > > This flag only has a meaning when working with directives files. Doesn't it just add confusion to have a default from CompileCommand? agree. take it off. ------------- PR: https://git.openjdk.java.net/jdk/pull/796 From xliu at openjdk.java.net Tue Oct 27 03:08:19 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 27 Oct 2020 03:08:19 GMT Subject: RFR: 8241495: Make more compiler related flags available on a per method level [v2] In-Reply-To: <_CXcBL6MckmEHVOCz5R6laZGzyknrIo9Y6jQ8y_LP5o=.9abf2f24-4c93-4e38-8bdd-f0ec86f153d4@github.com> References: <_CXcBL6MckmEHVOCz5R6laZGzyknrIo9Y6jQ8y_LP5o=.9abf2f24-4c93-4e38-8bdd-f0ec86f153d4@github.com> Message-ID: On Thu, 22 Oct 2020 09:16:34 GMT, Nils Eliasson wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8241495: Make more compiler related flags available on a per method level > > src/hotspot/share/compiler/compilerDirectives.hpp line 38: > >> 36: #define compilerdirectives_common_flags(cflags) \ >> 37: cflags(Enable, bool, false, Enable) \ >> 38: cflags(Exclude, bool, false, Exclude) \ > > CompileCommand uses a different semantic than CompilerDirectives to exclude methods from compilation. Are you sure you will be preserving backwards compatibility? > > See compilerDirectives.cpp lines 333-360 to see how the backwards compatibility is preserved. @neliasso, I don't realize that user can get same effect by setting -XX:CompileCommand=exclude/log/compileonly etc util today. now I get it why you place X here. `DirectiveSet::compilecommand_compatibility_init` ignore those cflags whose cc_flags are X. I withdraw Enable/Exclude/Log but leave BreakAtCompile and BreakAtExecution. because it's pointless to support -XX:CompileCommand=option,xxx,log. I prefer to have one single way to do it. ------------- PR: https://git.openjdk.java.net/jdk/pull/796 From xliu at openjdk.java.net Tue Oct 27 03:22:29 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 27 Oct 2020 03:22:29 GMT Subject: RFR: 8241495: Make more compiler related flags available on a per method level [v3] In-Reply-To: References: Message-ID: > 8241495: Make more compiler related flags available on a per method level Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8241495: Make more compiler related flags available on a per method level rollback others but only leave breakAtCompile and breakAtExecution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/796/files - new: https://git.openjdk.java.net/jdk/pull/796/files/29590492..acc72cc3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=796&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=796&range=01-02 Stats: 5 lines in 2 files changed: 1 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/796.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/796/head:pull/796 PR: https://git.openjdk.java.net/jdk/pull/796 From ngasson at openjdk.java.net Tue Oct 27 04:21:33 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Tue, 27 Oct 2020 04:21:33 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v5] In-Reply-To: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: <1IQqVGMwEJYYpHWOQXdtJvIp3hs6mjEJkaMfTb3lwWo=.5cb02e47-d773-41ba-b206-b2657417124c@github.com> > When using the Linux "perf" tool to do system profiling, symbol names of > running Java methods cannot be decoded, resulting in unhelpful output > such as: > > 10.52% [JIT] tid 236748 [.] 0x00007f6fdb75d223 > > Perf can read a simple text file format describing the mapping between > address ranges and symbol names for a particular process [1]. > > It's possible to generate this already for Java processes using a JVMTI > plugin such as perf-map-agent [2]. However this requires compiling > third-party code and then loading the agent into your Java process. It > would be more convenient if Hotspot could write this file directly using > a diagnostic command. The information required is almost identical to > that of the existing Compiler.codelist command. > > This patch adds a Compiler.perfmap diagnostic command on Linux only. To > use, first run "jcmd Compiler.perfmap" and then "perf top" or > "perf record" and the report should show decoded Java symbol names for > that process. > > As this just writes a snapshot of the code cache when the command is > run, it will become stale if methods are compiled later or unloaded. > However this shouldn't be a big problem in practice if the map file is > generated after the application has warmed up. > > [1] https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt > [2] https://github.com/jvm-profiling-tools/perf-map-agent Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: Make DumpPerfMapAtExit a diagnostic option ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/760/files - new: https://git.openjdk.java.net/jdk/pull/760/files/959adca5..d8a399a1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=760&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=760&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/760.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/760/head:pull/760 PR: https://git.openjdk.java.net/jdk/pull/760 From ngasson at openjdk.java.net Tue Oct 27 04:21:36 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Tue, 27 Oct 2020 04:21:36 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v4] In-Reply-To: References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: On Tue, 27 Oct 2020 01:07:04 GMT, Serguei Spitsyn wrote: >> Nick Gasson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge master >> - Add -XX:+DumpPerfMapAtExit option >> - Update for review comments >> - 8254723: add diagnostic command to write Linux perf map file >> >> When using the Linux "perf" tool to do system profiling, symbol names of >> running Java methods cannot be decoded, resulting in unhelpful output >> such as: >> >> 10.52% [JIT] tid 236748 [.] 0x00007f6fdb75d223 >> >> Perf can read a simple text file format describing the mapping between >> address ranges and symbol names for a particular process [1]. >> >> It's possible to generate this already for Java processes using a JVMTI >> plugin such as perf-map-agent [2]. However this requires compiling >> third-party code and then loading the agent into your Java process. It >> would be more convenient if Hotspot could write this file directly using >> a diagnostic command. The information required is almost identical to >> that of the existing Compiler.codelist command. >> >> This patch adds a Compiler.perfmap diagnostic command on Linux only. To >> use, first run "jcmd Compiler.perfmap" and then "perf top" or >> "perf record" and the report should show decoded Java symbol names for >> that process. >> >> As this just writes a snapshot of the code cache when the command is >> run, it will become stale if methods are compiled later or unloaded. >> However this shouldn't be a big problem in practice if the map file is >> generated after the application has warmed up. >> >> [1] https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt >> [2] https://github.com/jvm-profiling-tools/perf-map-agent > > Hi Nick, > > This looks good. > Thank you for the update. > > Thanks, > Serguei > I don't see any reason for this to be a product flag, rather than diagnostic. OK sure, I've made it a diagnostic flag. ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From dongbo at openjdk.java.net Tue Oct 27 06:32:29 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 27 Oct 2020 06:32:29 GMT Subject: RFR: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic [v2] In-Reply-To: References: Message-ID: > BigInteger.shiftRightImplWorker and BigInteger.shiftLeftImplWorker are not intrinsified on aarch64, which have been done on x86_64. > We can implement them via USHL NEON instruction (register), which handles four integers one time at most, against just integer C2 asm-code processed. > The usage of USHL can be found at: https://developer.arm.com/documentation/dui0801/g/A64-SIMD-Vector-Instructions/USHL--vector-?lang=en > > Patch passed jtreg tier1-3 tests on our aarch64 server. > Tests in test/jdk/java/math/BigInteger/* runned specially for the correctness of the implementation and passed. > > We tested test/micro/org/openjdk/bench/java/math/BigIntegers.java for performance gain on Kunpeng916 and Kunpeng920. > The following performance improvements were seen with this implementation: > - Intrinsification of BigInteger.shiftLeft: 25.52% (Kunpeng916), 37.56% (Kunpeng920) > - Intrinsification of BigInteger.shiftRight: 46.45% (Kunpeng916), 43.32% (Kunpeng920) > > The BigIntegers.java JMH micro-benchmark results: > Benchmark Mode Cnt Score Error Units > > # Kunpeng 916, default > BigIntegers.testAdd avgt 25 33.554 ? 0.224 ns/op > BigIntegers.testHugeToString avgt 25 575.554 ? 40.656 ns/op > BigIntegers.testLargeToString avgt 25 190.098 ? 0.825 ns/op > **BigIntegers.testLeftShift avgt 25 1495.779 ? 12.365 ns/op** > BigIntegers.testMultiply avgt 25 7551.707 ? 39.309 ns/op > **BigIntegers.testRightShift avgt 25 605.302 ? 6.710 ns/op** > BigIntegers.testSmallToString avgt 25 179.034 ? 0.873 ns/op > > # Kunpeng 916, intrinsic: > BigIntegers.testAdd avgt 25 33.531 ? 0.222 ns/op > BigIntegers.testHugeToString avgt 25 578.038 ? 40.675 ns/op > BigIntegers.testLargeToString avgt 25 188.566 ? 0.855 ns/op > **BigIntegers.testLeftShift avgt 25 1191.651 ? 20.136 ns/op** > BigIntegers.testMultiply avgt 25 7492.711 ? 3.702 ns/op > **BigIntegers.testRightShift avgt 25 326.891 ? 6.033 ns/op** > BigIntegers.testSmallToString avgt 25 178.267 ? 1.501 ns/op > > # Kunpeng 920, default > BigIntegers.testAdd avgt 25 22.790 ? 0.167 ns/op > BigIntegers.testHugeToString avgt 25 432.428 ? 10.736 ns/op > BigIntegers.testLargeToString avgt 25 121.899 ? 3.356 ns/op > **BigIntegers.testLeftShift avgt 25 883.530 ? 53.714 ns/op** > BigIntegers.testMultiply avgt 25 5918.845 ? 94.937 ns/op > **BigIntegers.testRightShift avgt 25 329.762 ? 15.850 ns/op** > BigIntegers.testSmallToString avgt 25 117.460 ? 3.040 ns/op > > # Kunpeng 920, intrinsic > BigIntegers.testAdd avgt 25 21.791 ? 0.085 ns/op > BigIntegers.testHugeToString avgt 25 415.209 ? 32.170 ns/op > BigIntegers.testLargeToString avgt 25 124.635 ? 2.157 ns/op > **BigIntegers.testLeftShift avgt 25 551.710 ? 7.836 ns/op** > BigIntegers.testMultiply avgt 25 5869.401 ? 54.803 ns/op > **BigIntegers.testRightShift avgt 25 186.896 ? 6.378 ns/op** > BigIntegers.testSmallToString avgt 25 117.543 ? 3.036 ns/op Dong Bo has updated the pull request incrementally with one additional commit since the last revision: minor improvements for small BigIntegers ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/861/files - new: https://git.openjdk.java.net/jdk/pull/861/files/e8df2a98..7a5d76f5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=861&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=861&range=00-01 Stats: 85 lines in 2 files changed: 63 ins; 2 del; 20 mod Patch: https://git.openjdk.java.net/jdk/pull/861.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/861/head:pull/861 PR: https://git.openjdk.java.net/jdk/pull/861 From dongbo at openjdk.java.net Tue Oct 27 06:47:17 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 27 Oct 2020 06:47:17 GMT Subject: RFR: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic In-Reply-To: References: Message-ID: <9DFfj6e0hiQbfz9gnZp52nVHASD2iZCgudqh5AUBJgA=.51645a98-6a49-4288-8ec3-69f8f3727352@github.com> On Mon, 26 Oct 2020 09:19:45 GMT, Dong Bo wrote: > BigInteger.shiftRightImplWorker and BigInteger.shiftLeftImplWorker are not intrinsified on aarch64, which have been done on x86_64. > We can implement them via USHL NEON instruction (register), which handles four integers one time at most, against just integer C2 asm-code processed. > The usage of USHL can be found at: https://developer.arm.com/documentation/dui0801/g/A64-SIMD-Vector-Instructions/USHL--vector-?lang=en > > Patch passed jtreg tier1-3 tests on our aarch64 server. > Tests in test/jdk/java/math/BigInteger/* runned specially for the correctness of the implementation and passed. > > We tested test/micro/org/openjdk/bench/java/math/BigIntegers.java for performance gain on Kunpeng916 and Kunpeng920. > The following performance improvements were seen with this implementation: > - Intrinsification of BigInteger.shiftLeft: 25.52% (Kunpeng916), 37.56% (Kunpeng920) > - Intrinsification of BigInteger.shiftRight: 46.45% (Kunpeng916), 43.32% (Kunpeng920) > > The BigIntegers.java JMH micro-benchmark results: > Benchmark Mode Cnt Score Error Units > > # Kunpeng 916, default > BigIntegers.testAdd avgt 25 33.554 ? 0.224 ns/op > BigIntegers.testHugeToString avgt 25 575.554 ? 40.656 ns/op > BigIntegers.testLargeToString avgt 25 190.098 ? 0.825 ns/op > **BigIntegers.testLeftShift avgt 25 1495.779 ? 12.365 ns/op** > BigIntegers.testMultiply avgt 25 7551.707 ? 39.309 ns/op > **BigIntegers.testRightShift avgt 25 605.302 ? 6.710 ns/op** > BigIntegers.testSmallToString avgt 25 179.034 ? 0.873 ns/op > > # Kunpeng 916, intrinsic: > BigIntegers.testAdd avgt 25 33.531 ? 0.222 ns/op > BigIntegers.testHugeToString avgt 25 578.038 ? 40.675 ns/op > BigIntegers.testLargeToString avgt 25 188.566 ? 0.855 ns/op > **BigIntegers.testLeftShift avgt 25 1191.651 ? 20.136 ns/op** > BigIntegers.testMultiply avgt 25 7492.711 ? 3.702 ns/op > **BigIntegers.testRightShift avgt 25 326.891 ? 6.033 ns/op** > BigIntegers.testSmallToString avgt 25 178.267 ? 1.501 ns/op > > # Kunpeng 920, default > BigIntegers.testAdd avgt 25 22.790 ? 0.167 ns/op > BigIntegers.testHugeToString avgt 25 432.428 ? 10.736 ns/op > BigIntegers.testLargeToString avgt 25 121.899 ? 3.356 ns/op > **BigIntegers.testLeftShift avgt 25 883.530 ? 53.714 ns/op** > BigIntegers.testMultiply avgt 25 5918.845 ? 94.937 ns/op > **BigIntegers.testRightShift avgt 25 329.762 ? 15.850 ns/op** > BigIntegers.testSmallToString avgt 25 117.460 ? 3.040 ns/op > > # Kunpeng 920, intrinsic > BigIntegers.testAdd avgt 25 21.791 ? 0.085 ns/op > BigIntegers.testHugeToString avgt 25 415.209 ? 32.170 ns/op > BigIntegers.testLargeToString avgt 25 124.635 ? 2.157 ns/op > **BigIntegers.testLeftShift avgt 25 551.710 ? 7.836 ns/op** > BigIntegers.testMultiply avgt 25 5869.401 ? 54.803 ns/op > **BigIntegers.testRightShift avgt 25 186.896 ? 6.378 ns/op** > BigIntegers.testSmallToString avgt 25 117.543 ? 3.036 ns/op @theRealAph Thanks for the quick review. Updated a version for small BigIntegers. The less-than-four-words loop is unpacked for minor performance improvements. Also modified code in ./test/micro/org/openjdk/bench/java/math/BigIntegers.java for small BigIntegers performance tests. New parameter `maxNumbits` in the test indicates bits count of a BigInteger range in `[maxNumbits - 31, maxNumbits]`. Incremental modification: https://github.com/openjdk/jdk/pull/861/commits/7a5d76f51e693d441dee30b3d109d1b67b525378 According to the new tests, performance regress 3%~%6 only if `maxNumbits == 32`. Seems the regression is inevitably caused by the intrinsic shared code, performance regress even if we return immediately from the stub, like: /* marked as cbz_ret below */ address generate_bigIntegerLeftShift() { __ align(CodeEntryAlignment); StubCodeMark mark(this, "StubRoutines", "bigIntegerLeftShiftWorker"); address start = __ pc(); Register numIter = c_rarg4; __ cbz(numIter, Exit); __ ret(lr); } The performance of `cbz_ret` is almost same with intrinsified 32-MaxNumbits tests. Similar tests, returns immediately with `__ ret(0)`, regress on a x86_64 platform too. The BigIntegers.java JMH micro-benchmark results of small BigIntegers (~256bits): Benchmark (maxNumbits) Mode Cnt Score Error Units # kunpeng 916, intrinsic BigIntegers.testSmallLeftShift 32 avgt 25 51.444 ? 0.256 ns/op (cbz_ret) BigIntegers.testSmallLeftShift 32 avgt 25 51.168 ? 0.235 ns/op BigIntegers.testSmallLeftShift 64 avgt 25 53.566 ? 0.694 ns/op BigIntegers.testSmallLeftShift 96 avgt 25 53.398 ? 0.651 ns/op BigIntegers.testSmallLeftShift 128 avgt 25 55.949 ? 0.977 ns/op BigIntegers.testSmallLeftShift 160 avgt 25 55.617 ? 0.568 ns/op BigIntegers.testSmallLeftShift 192 avgt 25 56.285 ? 0.959 ns/op BigIntegers.testSmallLeftShift 224 avgt 25 58.201 ? 0.965 ns/op BigIntegers.testSmallLeftShift 256 avgt 25 58.655 ? 0.953 ns/op BigIntegers.testSmallRightShift 32 avgt 25 56.210 ? 0.708 ns/op (cbz_ret) BigIntegers.testSmallRightShift 32 avgt 25 56.072 ? 0.712 ns/op BigIntegers.testSmallRightShift 64 avgt 25 56.891 ? 0.458 ns/op BigIntegers.testSmallRightShift 96 avgt 25 56.257 ? 0.185 ns/op BigIntegers.testSmallRightShift 128 avgt 25 56.970 ? 0.458 ns/op BigIntegers.testSmallRightShift 160 avgt 25 58.041 ? 0.344 ns/op BigIntegers.testSmallRightShift 192 avgt 25 58.740 ? 0.405 ns/op BigIntegers.testSmallRightShift 224 avgt 25 60.550 ? 0.382 ns/op BigIntegers.testSmallRightShift 256 avgt 25 65.617 ? 0.266 ns/op # kunpeng 916, default BigIntegers.testSmallLeftShift 32 avgt 25 49.350 ? 0.944 ns/op BigIntegers.testSmallLeftShift 64 avgt 25 56.810 ? 0.930 ns/op BigIntegers.testSmallLeftShift 96 avgt 25 59.472 ? 0.270 ns/op BigIntegers.testSmallLeftShift 128 avgt 25 61.208 ? 0.252 ns/op BigIntegers.testSmallLeftShift 160 avgt 25 63.339 ? 0.328 ns/op BigIntegers.testSmallLeftShift 192 avgt 25 66.456 ? 0.418 ns/op BigIntegers.testSmallLeftShift 224 avgt 25 68.437 ? 0.294 ns/op BigIntegers.testSmallLeftShift 256 avgt 25 70.301 ? 0.306 ns/op BigIntegers.testSmallRightShift 32 avgt 25 53.289 ? 0.272 ns/op BigIntegers.testSmallRightShift 64 avgt 25 65.618 ? 4.097 ns/op BigIntegers.testSmallRightShift 96 avgt 25 70.805 ? 3.695 ns/op BigIntegers.testSmallRightShift 128 avgt 25 70.862 ? 4.205 ns/op BigIntegers.testSmallRightShift 160 avgt 25 79.921 ? 3.272 ns/op BigIntegers.testSmallRightShift 192 avgt 25 75.168 ? 0.224 ns/op BigIntegers.testSmallRightShift 224 avgt 25 79.779 ? 0.609 ns/op BigIntegers.testSmallRightShift 256 avgt 25 84.364 ? 0.540 ns/op # kunepng 920, intrinsic BigIntegers.testSmallLeftShift 32 avgt 25 31.404 ? 0.984 ns/op (cbz_ret) BigIntegers.testSmallLeftShift 32 avgt 25 31.272 ? 0.558 ns/op BigIntegers.testSmallLeftShift 64 avgt 25 33.558 ? 1.354 ns/op BigIntegers.testSmallLeftShift 96 avgt 25 34.731 ? 1.238 ns/op BigIntegers.testSmallLeftShift 128 avgt 25 36.082 ? 1.196 ns/op BigIntegers.testSmallLeftShift 160 avgt 25 36.155 ? 0.932 ns/op BigIntegers.testSmallLeftShift 192 avgt 25 38.442 ? 0.743 ns/op BigIntegers.testSmallLeftShift 224 avgt 25 38.404 ? 1.108 ns/op BigIntegers.testSmallLeftShift 256 avgt 25 39.381 ? 1.140 ns/op BigIntegers.testSmallRightShift 32 avgt 25 30.821 ? 0.533 ns/op (cbz_ret) BigIntegers.testSmallRightShift 32 avgt 25 30.662 ? 1.625 ns/op BigIntegers.testSmallRightShift 64 avgt 25 32.686 ? 1.000 ns/op BigIntegers.testSmallRightShift 96 avgt 25 33.922 ? 1.068 ns/op BigIntegers.testSmallRightShift 128 avgt 25 34.997 ? 1.155 ns/op BigIntegers.testSmallRightShift 160 avgt 25 35.763 ? 1.159 ns/op BigIntegers.testSmallRightShift 192 avgt 25 38.180 ? 0.735 ns/op BigIntegers.testSmallRightShift 224 avgt 25 37.985 ? 1.619 ns/op BigIntegers.testSmallRightShift 256 avgt 25 39.957 ? 0.820 ns/op # kunpeng 920, default BigIntegers.testSmallLeftShift 32 avgt 25 29.524 ? 0.861 ns/op BigIntegers.testSmallLeftShift 64 avgt 25 35.917 ? 0.467 ns/op BigIntegers.testSmallLeftShift 96 avgt 25 36.915 ? 0.317 ns/op BigIntegers.testSmallLeftShift 128 avgt 25 39.709 ? 0.858 ns/op BigIntegers.testSmallLeftShift 160 avgt 25 42.796 ? 0.824 ns/op BigIntegers.testSmallLeftShift 192 avgt 25 43.612 ? 0.319 ns/op BigIntegers.testSmallLeftShift 224 avgt 25 45.971 ? 0.336 ns/op BigIntegers.testSmallLeftShift 256 avgt 25 48.399 ? 0.405 ns/op BigIntegers.testSmallRightShift 32 avgt 25 29.122 ? 0.870 ns/op BigIntegers.testSmallRightShift 64 avgt 25 35.404 ? 1.236 ns/op BigIntegers.testSmallRightShift 96 avgt 25 37.899 ? 1.478 ns/op BigIntegers.testSmallRightShift 128 avgt 25 39.570 ? 0.564 ns/op BigIntegers.testSmallRightShift 160 avgt 25 44.768 ? 1.423 ns/op BigIntegers.testSmallRightShift 192 avgt 25 44.777 ? 1.433 ns/op BigIntegers.testSmallRightShift 224 avgt 25 49.085 ? 0.465 ns/op BigIntegers.testSmallRightShift 256 avgt 25 48.871 ? 1.086 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/861 From jiefu at openjdk.java.net Tue Oct 27 07:40:23 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 27 Oct 2020 07:40:23 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen Message-ID: Hi all, Just as @jatin-bhateja pointed out [1], there are more instructs in x86.ad which should use legacy mode. It would be better to fix the following cases: ------------------------ 1. instruct mul2L_reg The code-gen logic uses phaddd [2], which requires legacy mode here [3]. This bug might be reproduced on AVX512 machines without avx512dq. 2. instruct vmul4L_reg_avx The code-gen logic uses vphaddd [4], which requires legacy mode here [5]. This bug might be reproduced on AVX512 machines without avx512dq. 3. instruct reductionL For MulReductionVL, the code-gen chain can be: reduceL --> reduce4L --> reduce_operation_128 --> vpmullq [6] vpmullq require legacy mode [7] if avx512dq isn't supported. This bug might be reproduced on AVX512 machines without avx512dq. 4. instruct reductionB For MinReductionV, the code-gen chain can be: reduceB --> reduce32B --> reduce_operation_128 --> pminsb [8] pminsb require legacy mode [9] if avx512bw isn't supported. This bug might be reproduced on AVX512 machines without avx512bw. ------------------------ Bugs in mul2L_reg/vmul4L_reg_avx/reductionL can be only reproduced on AVX512 machines without avx512dq. And bug in reductionB can be only reproduced on AVX512 machines without avx512bw. Unfortunately, it's impossible for us to create reproducers since our AVX512 platforms support both avx512dq and avx512bw. However, it do make sense to fix these unexposed bugs since vector api code will be sure to run on various paltforms (e.g., AVX512 machines without avx512dq/bw) in the future. The fix just changes vec to legVec, which is quite safe in theory. As for the reduction patterns of Float and Double, I don't see any reason that they should use legacy mode (maybe I've missed something). Testing: - jdk/incubator/vector on both AVX512 and AVX256 machines Any comments? Thanks a lot. Best regards, Jie [1] https://github.com/openjdk/jdk/pull/791#commitcomment-43473733 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5472 [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6217 [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5497 [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6165 [6] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1521 [7] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6428 [8] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1482 [9] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6475 ------------- Commit messages: - 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen Changes: https://git.openjdk.java.net/jdk/pull/874/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=874&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255438 Stats: 47 lines in 1 file changed: 0 ins; 41 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/874.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/874/head:pull/874 PR: https://git.openjdk.java.net/jdk/pull/874 From shade at openjdk.java.net Tue Oct 27 08:26:28 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 27 Oct 2020 08:26:28 GMT Subject: RFR: 8255441: Cleanup ciEnv/jvmciEnv::lookup_method-s Message-ID: Static analysis complains there is a potentially uninitialized `dest_method` after the switch, oblivious of `ShouldNotReachHere()`. This can be cleaned up along with related code. ------------- Commit messages: - 8255441: Cleanup ciEnv/jvmciEnv::lookup_method-s Changes: https://git.openjdk.java.net/jdk/pull/875/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=875&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255441 Stats: 55 lines in 2 files changed: 8 ins; 19 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/875.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/875/head:pull/875 PR: https://git.openjdk.java.net/jdk/pull/875 From neliasso at openjdk.java.net Tue Oct 27 09:28:17 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 27 Oct 2020 09:28:17 GMT Subject: RFR: 8241495: Make more compiler related flags available on a per method level [v3] In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 03:22:29 GMT, Xin Liu wrote: >> 8241495: Make more compiler related flags available on a per method level > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8241495: Make more compiler related flags available on a per method level > > rollback others but only leave breakAtCompile and breakAtExecution. Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/796 From rrich at openjdk.java.net Tue Oct 27 10:04:19 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Tue, 27 Oct 2020 10:04:19 GMT Subject: RFR: 8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown [v3] In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 00:52:20 GMT, Serguei Spitsyn wrote: >> Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Skip test cases expecting OOMEs if running with ZGC. >> - Merge branch 'master' into JDK-8255072-eatests-oom-not-thrown >> - Make OOME more reliable and skip test cases if it is not expected because scalar replacement is disabled >> - 8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown > > Hi Richard, > It looks good to me. > One nit: > public static final boolean DoEscapeAnalysis = unbox(WB.getBooleanVMFlag("DoEscapeAnalysis"), UseJVMCICompiler); > public static final boolean EliminateAllocations = unbox(WB.getBooleanVMFlag("EliminateAllocations"), UseJVMCICompiler); // read by debugger > public static final boolean DeoptimizeObjectsALot = WB.getBooleanVMFlag("DeoptimizeObjectsALot"); // read by debugger > public static final long BiasedLockingBulkRebiasThreshold = WB.getIntxVMFlag("BiasedLockingBulkRebiasThreshold"); > public static final long BiasedLockingBulkRevokeThreshold = WB.getIntxVMFlag("BiasedLockingBulkRevokeThreshold"); > + public static final boolean ZGCIsSelected = GC.Z.isSelected(); > There are unneeded spaces before '=' sign. > > Thanks, > Serguei Hi Serguei, thanks for reviewing. I'll remove the whitespace. I'm able now to reproduce the issue but only with ZGC. So far my attempts(*) to reliably get the OOME during ForceEarlyReturn/PopFrame because of object reallocation failed though. So I'm still in favour of the current solution which is: skip the 3 problematic testcases if ZGC is selected in the target vm. I'm still open for suggestions also though. I'll wait a few more days and then I'll integrate. Thanks, Richard. (*) I tried: - disable TLAB - call WhiteBox.fullGC() in consumeAllMemory() before the last round of allocations. - Check if the memory can be allocated by the thread doing the PopFrame/ForceEarlyReturn. com.sun.jdi.ThreadReference::invokeMethod() cannot be used. A target thread has to be specified and the jdwp threads are not visible through jdi. Only a dedicated native test JVMTI agent can consume all memory and then do the PopFrame/ForceEarlyReturn. ------------- PR: https://git.openjdk.java.net/jdk/pull/775 From rrich at openjdk.java.net Tue Oct 27 10:16:29 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Tue, 27 Oct 2020 10:16:29 GMT Subject: RFR: 8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown [v4] In-Reply-To: References: Message-ID: > The following test cases try to provoke VMOutOfMemoryException during object reallocation because of JVMTI PopFrame / ForceEarlyReturn: > > EAPopFrameNotInlinedReallocFailure > EAPopInlinedMethodWithScalarReplacedObjectsReallocFailure > EAForceEarlyReturnOfInlinedMethodWithScalarReplacedObjectsReallocFailure > > For ZGC (so far) this is not 100% reliable. > > Just ignoring the runs where the expected OOME was not raised was not accepted. > > Summary of the now accepted solution: > > - The 3 problematic test cases are skipped if ZGC is selected. > > - They are also skipped if no OOME during object reallocation can be expected because allocations are not eliminated. > > - In consumeAllMemory, as a last step, empty LinkedList nodes are created without long array to fill up small blocks of free memory. > > - EATests.java is removed from the problem list for ZGC. Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Whitespace/indentation clean-up. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/775/files - new: https://git.openjdk.java.net/jdk/pull/775/files/33ceb741..4676f1da Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=775&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=775&range=02-03 Stats: 6 lines in 1 file changed: 1 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/775.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/775/head:pull/775 PR: https://git.openjdk.java.net/jdk/pull/775 From martin.doerr at sap.com Tue Oct 27 11:07:21 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 27 Oct 2020 11:07:21 +0000 Subject: C2 crash in PhaseIdealLoop::build_loop_late_post_work Message-ID: Hi, we observe C2 crashes on all x86_64 platforms when running Renaissance Benchmark "log-regression" since 2020-08-10. (hg was at "8250848: [aarch64] nativeGotJump_at() missing call to verify()." when we've seen it first.) # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffd594e9322, pid=1439804, tid=1516348 # # JRE version: OpenJDK Runtime Environment (16.0.0.1) (build 16.0.0.1-internal+0-adhoc.openjdk.jdk-dev) # Java VM: OpenJDK 64-Bit Server VM (16.0.0.1-internal+0-adhoc.openjdk.jdk-dev, mixed mode, sharing, tiered, compressed oops, g1 gc, windows-amd64) # Problematic frame: # V [jvm.dll+0x559322] PhaseIdealLoop::build_loop_late_post_work+0x212 Host: Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz, 8 cores, 15G, Windows Server 2016 , 64 bit Build 14393 (10.0.14393.3630) Time: Sat Oct 24 19:40:57 2020 W. Europe Daylight Time elapsed time: 16.016253 seconds (0d 0h 0m 16s) V [jvm.dll+0x559322] PhaseIdealLoop::build_loop_late_post_work+0x212 (loopnode.cpp:5312) V [jvm.dll+0x5590b4] PhaseIdealLoop::build_loop_late+0x214 (loopnode.cpp:5159) V [jvm.dll+0x558418] PhaseIdealLoop::build_and_optimize+0x7b8 (loopnode.cpp:3874) V [jvm.dll+0x21410a] Compile::optimize_loops+0x15a (compile.cpp:1960) V [jvm.dll+0x20d900] Compile::Optimize+0xf90 (compile.cpp:2184) V [jvm.dll+0x20b105] Compile::Compile+0xba5 (compile.cpp:735) V [jvm.dll+0x19de6b] C2Compiler::compile_method+0xab (c2compiler.cpp:104) Is this a known problem? Best regards, Martin From christian.hagedorn at oracle.com Tue Oct 27 12:02:43 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 27 Oct 2020 13:02:43 +0100 Subject: C2 crash in PhaseIdealLoop::build_loop_late_post_work In-Reply-To: References: Message-ID: Hi Martin Yes, this is a known problem (see JDK-8251925). We also see it failing with only this benchmark, often in log-regression. I played around with different settings by just running log-regression, for example, which did not seem to trigger it, though. Do you have a fast reproducer? It started to fail after JDK-8249749 which, however, just seem to have revealed the problem. Best regards, Christian On 27.10.20 12:07, Doerr, Martin wrote: > Hi, > > we observe C2 crashes on all x86_64 platforms when running Renaissance Benchmark "log-regression" since 2020-08-10. > (hg was at "8250848: [aarch64] nativeGotJump_at() missing call to verify()." when we've seen it first.) > > > # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffd594e9322, pid=1439804, tid=1516348 > > # > > # JRE version: OpenJDK Runtime Environment (16.0.0.1) (build 16.0.0.1-internal+0-adhoc.openjdk.jdk-dev) > > # Java VM: OpenJDK 64-Bit Server VM (16.0.0.1-internal+0-adhoc.openjdk.jdk-dev, mixed mode, sharing, tiered, compressed oops, g1 gc, windows-amd64) > > # Problematic frame: > > # V [jvm.dll+0x559322] PhaseIdealLoop::build_loop_late_post_work+0x212 > > > > Host: Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz, 8 cores, 15G, Windows Server 2016 , 64 bit Build 14393 (10.0.14393.3630) > > Time: Sat Oct 24 19:40:57 2020 W. Europe Daylight Time elapsed time: 16.016253 seconds (0d 0h 0m 16s) > > > V [jvm.dll+0x559322] PhaseIdealLoop::build_loop_late_post_work+0x212 (loopnode.cpp:5312) > > V [jvm.dll+0x5590b4] PhaseIdealLoop::build_loop_late+0x214 (loopnode.cpp:5159) > > V [jvm.dll+0x558418] PhaseIdealLoop::build_and_optimize+0x7b8 (loopnode.cpp:3874) > > V [jvm.dll+0x21410a] Compile::optimize_loops+0x15a (compile.cpp:1960) > > V [jvm.dll+0x20d900] Compile::Optimize+0xf90 (compile.cpp:2184) > > V [jvm.dll+0x20b105] Compile::Compile+0xba5 (compile.cpp:735) > > V [jvm.dll+0x19de6b] C2Compiler::compile_method+0xab (c2compiler.cpp:104) > > Is this a known problem? > > Best regards, > Martin > From martin.doerr at sap.com Tue Oct 27 12:05:35 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 27 Oct 2020 12:05:35 +0000 Subject: C2 crash in PhaseIdealLoop::build_loop_late_post_work In-Reply-To: References: Message-ID: Hi again, I just found it: https://bugs.openjdk.java.net/browse/JDK-8251925 Crash happens when compiling the following method: Current CompileTask: C2: 12413 13451 4 org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection::apply (928 bytes) Best regards, Martin From: Doerr, Martin Sent: Dienstag, 27. Oktober 2020 12:07 To: 'hotspot-compiler-dev at openjdk.java.net' Subject: C2 crash in PhaseIdealLoop::build_loop_late_post_work Hi, we observe C2 crashes on all x86_64 platforms when running Renaissance Benchmark "log-regression" since 2020-08-10. (hg was at "8250848: [aarch64] nativeGotJump_at() missing call to verify()." when we've seen it first.) # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffd594e9322, pid=1439804, tid=1516348 # # JRE version: OpenJDK Runtime Environment (16.0.0.1) (build 16.0.0.1-internal+0-adhoc.openjdk.jdk-dev) # Java VM: OpenJDK 64-Bit Server VM (16.0.0.1-internal+0-adhoc.openjdk.jdk-dev, mixed mode, sharing, tiered, compressed oops, g1 gc, windows-amd64) # Problematic frame: # V [jvm.dll+0x559322] PhaseIdealLoop::build_loop_late_post_work+0x212 Host: Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz, 8 cores, 15G, Windows Server 2016 , 64 bit Build 14393 (10.0.14393.3630) Time: Sat Oct 24 19:40:57 2020 W. Europe Daylight Time elapsed time: 16.016253 seconds (0d 0h 0m 16s) V [jvm.dll+0x559322] PhaseIdealLoop::build_loop_late_post_work+0x212 (loopnode.cpp:5312) V [jvm.dll+0x5590b4] PhaseIdealLoop::build_loop_late+0x214 (loopnode.cpp:5159) V [jvm.dll+0x558418] PhaseIdealLoop::build_and_optimize+0x7b8 (loopnode.cpp:3874) V [jvm.dll+0x21410a] Compile::optimize_loops+0x15a (compile.cpp:1960) V [jvm.dll+0x20d900] Compile::Optimize+0xf90 (compile.cpp:2184) V [jvm.dll+0x20b105] Compile::Compile+0xba5 (compile.cpp:735) V [jvm.dll+0x19de6b] C2Compiler::compile_method+0xab (c2compiler.cpp:104) Is this a known problem? Best regards, Martin From xliu at openjdk.java.net Tue Oct 27 15:13:21 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 27 Oct 2020 15:13:21 GMT Subject: RFR: 8241495: Make more compiler related flags available on a per method level [v3] In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 09:25:38 GMT, Nils Eliasson wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8241495: Make more compiler related flags available on a per method level >> >> rollback others but only leave breakAtCompile and breakAtExecution. > > Looks good! @neliasso thank you for reviewing it. ------------- PR: https://git.openjdk.java.net/jdk/pull/796 From martin.doerr at sap.com Tue Oct 27 15:28:41 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 27 Oct 2020 15:28:41 +0000 Subject: C2 crash in PhaseIdealLoop::build_loop_late_post_work In-Reply-To: References: Message-ID: Hi Christian, thanks for your reply. Interesting. Another issue related to JDK-8249749 was recently fixed: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X https://bugs.openjdk.java.net/browse/JDK-8251994 Maybe it helps for this, too. I don?t have a fast reproducer atm. Best regards, Martin > -----Original Message----- > From: Christian Hagedorn > Sent: Dienstag, 27. Oktober 2020 13:03 > To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > Subject: Re: C2 crash in PhaseIdealLoop::build_loop_late_post_work > > Hi Martin > > Yes, this is a known problem (see JDK-8251925). We also see it failing > with only this benchmark, often in log-regression. I played around with > different settings by just running log-regression, for example, which > did not seem to trigger it, though. Do you have a fast reproducer? > > It started to fail after JDK-8249749 which, however, just seem to have > revealed the problem. > > Best regards, > Christian > > On 27.10.20 12:07, Doerr, Martin wrote: > > Hi, > > > > we observe C2 crashes on all x86_64 platforms when running Renaissance > Benchmark "log-regression" since 2020-08-10. > > (hg was at "8250848: [aarch64] nativeGotJump_at() missing call to verify()." > when we've seen it first.) > > > > > > # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at > pc=0x00007ffd594e9322, pid=1439804, tid=1516348 > > > > # > > > > # JRE version: OpenJDK Runtime Environment (16.0.0.1) (build 16.0.0.1- > internal+0-adhoc.openjdk.jdk-dev) > > > > # Java VM: OpenJDK 64-Bit Server VM (16.0.0.1-internal+0- > adhoc.openjdk.jdk-dev, mixed mode, sharing, tiered, compressed oops, g1 > gc, windows-amd64) > > > > # Problematic frame: > > > > # V [jvm.dll+0x559322] > PhaseIdealLoop::build_loop_late_post_work+0x212 > > > > > > > > Host: Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz, 8 cores, 15G, Windows > Server 2016 , 64 bit Build 14393 (10.0.14393.3630) > > > > Time: Sat Oct 24 19:40:57 2020 W. Europe Daylight Time elapsed time: > 16.016253 seconds (0d 0h 0m 16s) > > > > > > V [jvm.dll+0x559322] PhaseIdealLoop::build_loop_late_post_work+0x212 > (loopnode.cpp:5312) > > > > V [jvm.dll+0x5590b4] PhaseIdealLoop::build_loop_late+0x214 > (loopnode.cpp:5159) > > > > V [jvm.dll+0x558418] PhaseIdealLoop::build_and_optimize+0x7b8 > (loopnode.cpp:3874) > > > > V [jvm.dll+0x21410a] Compile::optimize_loops+0x15a (compile.cpp:1960) > > > > V [jvm.dll+0x20d900] Compile::Optimize+0xf90 (compile.cpp:2184) > > > > V [jvm.dll+0x20b105] Compile::Compile+0xba5 (compile.cpp:735) > > > > V [jvm.dll+0x19de6b] C2Compiler::compile_method+0xab > (c2compiler.cpp:104) > > > > Is this a known problem? > > > > Best regards, > > Martin > > From christian.hagedorn at oracle.com Tue Oct 27 15:30:49 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 27 Oct 2020 16:30:49 +0100 Subject: C2 crash in PhaseIdealLoop::build_loop_late_post_work In-Reply-To: References: Message-ID: <09aa2715-0777-e69b-298f-17700846d348@oracle.com> Hi Martin Thanks for the pointer! I will have a look at that. Best regards, Christian On 27.10.20 16:28, Doerr, Martin wrote: > Hi Christian, > > thanks for your reply. > > Interesting. Another issue related to JDK-8249749 was recently fixed: > 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X > https://bugs.openjdk.java.net/browse/JDK-8251994 > Maybe it helps for this, too. > > I don?t have a fast reproducer atm. > > Best regards, > Martin > > >> -----Original Message----- >> From: Christian Hagedorn >> Sent: Dienstag, 27. Oktober 2020 13:03 >> To: Doerr, Martin ; 'hotspot-compiler- >> dev at openjdk.java.net' >> Subject: Re: C2 crash in PhaseIdealLoop::build_loop_late_post_work >> >> Hi Martin >> >> Yes, this is a known problem (see JDK-8251925). We also see it failing >> with only this benchmark, often in log-regression. I played around with >> different settings by just running log-regression, for example, which >> did not seem to trigger it, though. Do you have a fast reproducer? >> >> It started to fail after JDK-8249749 which, however, just seem to have >> revealed the problem. >> >> Best regards, >> Christian >> >> On 27.10.20 12:07, Doerr, Martin wrote: >>> Hi, >>> >>> we observe C2 crashes on all x86_64 platforms when running Renaissance >> Benchmark "log-regression" since 2020-08-10. >>> (hg was at "8250848: [aarch64] nativeGotJump_at() missing call to verify()." >> when we've seen it first.) >>> >>> >>> # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at >> pc=0x00007ffd594e9322, pid=1439804, tid=1516348 >>> >>> # >>> >>> # JRE version: OpenJDK Runtime Environment (16.0.0.1) (build 16.0.0.1- >> internal+0-adhoc.openjdk.jdk-dev) >>> >>> # Java VM: OpenJDK 64-Bit Server VM (16.0.0.1-internal+0- >> adhoc.openjdk.jdk-dev, mixed mode, sharing, tiered, compressed oops, g1 >> gc, windows-amd64) >>> >>> # Problematic frame: >>> >>> # V [jvm.dll+0x559322] >> PhaseIdealLoop::build_loop_late_post_work+0x212 >>> >>> >>> >>> Host: Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz, 8 cores, 15G, Windows >> Server 2016 , 64 bit Build 14393 (10.0.14393.3630) >>> >>> Time: Sat Oct 24 19:40:57 2020 W. Europe Daylight Time elapsed time: >> 16.016253 seconds (0d 0h 0m 16s) >>> >>> >>> V [jvm.dll+0x559322] PhaseIdealLoop::build_loop_late_post_work+0x212 >> (loopnode.cpp:5312) >>> >>> V [jvm.dll+0x5590b4] PhaseIdealLoop::build_loop_late+0x214 >> (loopnode.cpp:5159) >>> >>> V [jvm.dll+0x558418] PhaseIdealLoop::build_and_optimize+0x7b8 >> (loopnode.cpp:3874) >>> >>> V [jvm.dll+0x21410a] Compile::optimize_loops+0x15a (compile.cpp:1960) >>> >>> V [jvm.dll+0x20d900] Compile::Optimize+0xf90 (compile.cpp:2184) >>> >>> V [jvm.dll+0x20b105] Compile::Compile+0xba5 (compile.cpp:735) >>> >>> V [jvm.dll+0x19de6b] C2Compiler::compile_method+0xab >> (c2compiler.cpp:104) >>> >>> Is this a known problem? >>> >>> Best regards, >>> Martin >>> From chagedorn at openjdk.java.net Tue Oct 27 16:00:19 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 27 Oct 2020 16:00:19 GMT Subject: RFR: 8255245: C1: Fix output of -XX:+PrintCFGToFile to open it with visualizer [v2] In-Reply-To: References: <3C7jK0CaVaSKWmEKUQoSH9xGrm9JcQP-0efnki4nM6Y=.66141a70-1c23-4b2c-ac44-b0e7aac2ba0c@github.com> Message-ID: <13VY7DAgkZUqEmTfA9cCaYLLRzTm5wW1Z4Se52Uw5Sk=.20d1ab0f-9faf-4c89-8ad2-2af9fb2c8238@github.com> On Mon, 26 Oct 2020 22:29:06 GMT, Xin Liu wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Move print_on definition to header file > > src/hotspot/share/c1/c1_LinearScan.hpp line 641: > >> 639: } >> 640: // Special version for compatibility with C1 Visualizer. >> 641: void print_on(outputStream* out, bool is_cfg_printer) const; > > then why not just print_on(outputStream* out, bool is_cfg_printer=false) const directly? those 3 lines can be further saved. :) This would be a good idea but unfortunately that does not compile as it complains that `virtual void AllocatedObj::print_on(outputStream*) const` was hidden. ------------- PR: https://git.openjdk.java.net/jdk/pull/837 From aph at redhat.com Tue Oct 27 16:13:40 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 27 Oct 2020 16:13:40 +0000 Subject: RFR: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic In-Reply-To: <9DFfj6e0hiQbfz9gnZp52nVHASD2iZCgudqh5AUBJgA=.51645a98-6a49-4288-8ec3-69f8f3727352@github.com> References: <9DFfj6e0hiQbfz9gnZp52nVHASD2iZCgudqh5AUBJgA=.51645a98-6a49-4288-8ec3-69f8f3727352@github.com> Message-ID: <5b4bddbf-88a8-e24b-737f-3fd58d1a3e37@redhat.com> On 27/10/2020 06:47, Dong Bo wrote: > Updated a version for small BigIntegers. > The less-than-four-words loop is unpacked for minor performance improvements. > Also modified code in ./test/micro/org/openjdk/bench/java/math/BigIntegers.java for small BigIntegers performance tests. > New parameter `maxNumbits` in the test indicates bits count of a BigInteger range in `[maxNumbits - 31, maxNumbits]`. > Incremental modification: https://github.com/openjdk/jdk/pull/861/commits/7a5d76f51e693d441dee30b3d109d1b67b525378 > > According to the new tests, performance regress 3%~%6 only if `maxNumbits == 32`. > Seems the regression is inevitably caused by the intrinsic shared code, > performance regress even if we return immediately from the stub, like: > /* marked as cbz_ret below */ > address generate_bigIntegerLeftShift() { > __ align(CodeEntryAlignment); > StubCodeMark mark(this, "StubRoutines", "bigIntegerLeftShiftWorker"); > address start = __ pc(); > Register numIter = c_rarg4; > __ cbz(numIter, Exit); > __ ret(lr); > } > The performance of `cbz_ret` is almost same with intrinsified 32-MaxNumbits tests. > Similar tests, returns immediately with `__ ret(0)`, regress on a x86_64 platform too. > > The BigIntegers.java JMH micro-benchmark results of small BigIntegers (~256bits): OK. I think there's no point pushing the small BigIntegers case any further because the runtime is so dominated by things other than the work of the actual shifting. This is the profile with maxNumbits = 256, and even then the cost of doing the shifting is only 4.9% of the total runtime. The rest is the cost of the control logic and of allocating and zeroing an array for the result. I think we're done. StubRoutines::bigIntegerLeftShiftWorker [0x0000ffff80421d00, 0x0000ffff80421dd0] (208 bytes) -------------------------------------------------------------------------------- 0.36% 0x0000ffff80421d00: cbz x4, Stub::bigIntegerLeftShiftWorker+204 0x0000ffff80421dcc 0.03% 0x0000ffff80421d04: add xscratch2, x1, #0x4 0x0000ffff80421d08: add x0, x0, x2, lsl #2 0.52% 0x0000ffff80421d0c: orr wscratch1, wzr, #0x20 0x0000ffff80421d10: sub wscratch1, wscratch1, w3 0x0000ffff80421d14: cmp x4, #0x4 ? 0x0000ffff80421d18: b.lt Stub::bigIntegerLeftShiftWorker+124 0x0000ffff80421d7c ? 0x0000ffff80421d1c: dup v3.4s, w3 0.68% ? 0x0000ffff80421d20: dup v4.4s, wscratch1 0.03% ? 0x0000ffff80421d24: neg v4.4s, v4.4s ? ? 0x0000ffff80421d28: ld1 {v0.4s}, [x1], #16 0.42% ? ? 0x0000ffff80421d2c: ld1 {v1.4s}, [xscratch2], #16 0.03% ? ? 0x0000ffff80421d30: ushl v0.4s, v0.4s, v3.4s 0.03% ? ? 0x0000ffff80421d34: ushl v1.4s, v1.4s, v4.4s 0.42% ? ? 0x0000ffff80421d38: orr v2.16b, v0.16b, v1.16b ? ? 0x0000ffff80421d3c: st1 {v2.4s}, [x0], #16 ? ? 0x0000ffff80421d40: sub x4, x4, #0x4 0.23% ? ? 0x0000ffff80421d44: cmp x4, #0x4 ??? 0x0000ffff80421d48: b.lt Stub::bigIntegerLeftShiftWorker+80 0x0000ffff80421d50 ??? 0x0000ffff80421d4c: b Stub::bigIntegerLeftShiftWorker+40 0x0000ffff80421d28 ?? ? 0x0000ffff80421d50: cbz x4, Stub::bigIntegerLeftShiftWorker+204 0x0000ffff80421dcc ? ? 0x0000ffff80421d54: cmp x4, #0x1 ? ? 0x0000ffff80421d58: b.eq Stub::bigIntegerLeftShiftWorker+180 0x0000ffff80421db4 ? ? 0x0000ffff80421d5c: ld1 {v0.2s}, [x1], #8 0.71% ? ? 0x0000ffff80421d60: ld1 {v1.2s}, [xscratch2], #8 ? ? 0x0000ffff80421d64: ushl v0.2s, v0.2s, v3.2s 0.94% ? ? 0x0000ffff80421d68: ushl v1.2s, v1.2s, v4.2s ? ? 0x0000ffff80421d6c: orr v2.8b, v0.8b, v1.8b ? ? 0x0000ffff80421d70: st1 {v2.2s}, [x0], #8 0.49% ? ? 0x0000ffff80421d74: sub x4, x4, #0x2 ? ? 0x0000ffff80421d78: b Stub::bigIntegerLeftShiftWorker+80 0x0000ffff80421d50 ? 0x0000ffff80421d7c: ldr w10, [x1],#4 0x0000ffff80421d80: ldr w11, [xscratch2],#4 0x0000ffff80421d84: lsl w10, w10, w3 0x0000ffff80421d88: lsr w11, w11, wscratch1 0x0000ffff80421d8c: orr w12, w10, w11 0x0000ffff80421d90: str w12, [x0],#4 0x0000ffff80421d94: tbz w4, #1, Stub::bigIntegerLeftShiftWorker+204 0x0000ffff80421dcc 0x0000ffff80421d98: tbz w4, #0, Stub::bigIntegerLeftShiftWorker+180 0x0000ffff80421db4 0x0000ffff80421d9c: ldr w10, [x1],#4 ................................................................................................... 4.90% -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From xliu at openjdk.java.net Tue Oct 27 16:23:30 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 27 Oct 2020 16:23:30 GMT Subject: RFR: 8255245: C1: Fix output of -XX:+PrintCFGToFile to open it with visualizer [v2] In-Reply-To: <13VY7DAgkZUqEmTfA9cCaYLLRzTm5wW1Z4Se52Uw5Sk=.20d1ab0f-9faf-4c89-8ad2-2af9fb2c8238@github.com> References: <3C7jK0CaVaSKWmEKUQoSH9xGrm9JcQP-0efnki4nM6Y=.66141a70-1c23-4b2c-ac44-b0e7aac2ba0c@github.com> <13VY7DAgkZUqEmTfA9cCaYLLRzTm5wW1Z4Se52Uw5Sk=.20d1ab0f-9faf-4c89-8ad2-2af9fb2c8238@github.com> Message-ID: On Tue, 27 Oct 2020 15:57:10 GMT, Christian Hagedorn wrote: >> src/hotspot/share/c1/c1_LinearScan.hpp line 641: >> >>> 639: } >>> 640: // Special version for compatibility with C1 Visualizer. >>> 641: void print_on(outputStream* out, bool is_cfg_printer) const; >> >> then why not just print_on(outputStream* out, bool is_cfg_printer=false) const directly? those 3 lines can be further saved. :) > > This would be a good idea but unfortunately that does not compile as it complains that `virtual void AllocatedObj::print_on(outputStream*) const` was hidden. you are right. I just figure out that print_on(outputStream*) const is virtual. we can't use that trick. anyway, your patch looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/837 From aph at openjdk.java.net Tue Oct 27 16:48:23 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 27 Oct 2020 16:48:23 GMT Subject: RFR: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic [v2] In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 06:32:29 GMT, Dong Bo wrote: >> BigInteger.shiftRightImplWorker and BigInteger.shiftLeftImplWorker are not intrinsified on aarch64, which have been done on x86_64. >> We can implement them via USHL NEON instruction (register), which handles four integers one time at most, against just integer C2 asm-code processed. >> The usage of USHL can be found at: https://developer.arm.com/documentation/dui0801/g/A64-SIMD-Vector-Instructions/USHL--vector-?lang=en >> >> Patch passed jtreg tier1-3 tests on our aarch64 server. >> Tests in test/jdk/java/math/BigInteger/* runned specially for the correctness of the implementation and passed. >> >> We tested test/micro/org/openjdk/bench/java/math/BigIntegers.java for performance gain on Kunpeng916 and Kunpeng920. >> The following performance improvements were seen with this implementation: >> - Intrinsification of BigInteger.shiftLeft: 25.52% (Kunpeng916), 37.56% (Kunpeng920) >> - Intrinsification of BigInteger.shiftRight: 46.45% (Kunpeng916), 43.32% (Kunpeng920) >> >> The BigIntegers.java JMH micro-benchmark results: >> Benchmark Mode Cnt Score Error Units >> >> # Kunpeng 916, default >> BigIntegers.testAdd avgt 25 33.554 ? 0.224 ns/op >> BigIntegers.testHugeToString avgt 25 575.554 ? 40.656 ns/op >> BigIntegers.testLargeToString avgt 25 190.098 ? 0.825 ns/op >> **BigIntegers.testLeftShift avgt 25 1495.779 ? 12.365 ns/op** >> BigIntegers.testMultiply avgt 25 7551.707 ? 39.309 ns/op >> **BigIntegers.testRightShift avgt 25 605.302 ? 6.710 ns/op** >> BigIntegers.testSmallToString avgt 25 179.034 ? 0.873 ns/op >> >> # Kunpeng 916, intrinsic: >> BigIntegers.testAdd avgt 25 33.531 ? 0.222 ns/op >> BigIntegers.testHugeToString avgt 25 578.038 ? 40.675 ns/op >> BigIntegers.testLargeToString avgt 25 188.566 ? 0.855 ns/op >> **BigIntegers.testLeftShift avgt 25 1191.651 ? 20.136 ns/op** >> BigIntegers.testMultiply avgt 25 7492.711 ? 3.702 ns/op >> **BigIntegers.testRightShift avgt 25 326.891 ? 6.033 ns/op** >> BigIntegers.testSmallToString avgt 25 178.267 ? 1.501 ns/op >> >> # Kunpeng 920, default >> BigIntegers.testAdd avgt 25 22.790 ? 0.167 ns/op >> BigIntegers.testHugeToString avgt 25 432.428 ? 10.736 ns/op >> BigIntegers.testLargeToString avgt 25 121.899 ? 3.356 ns/op >> **BigIntegers.testLeftShift avgt 25 883.530 ? 53.714 ns/op** >> BigIntegers.testMultiply avgt 25 5918.845 ? 94.937 ns/op >> **BigIntegers.testRightShift avgt 25 329.762 ? 15.850 ns/op** >> BigIntegers.testSmallToString avgt 25 117.460 ? 3.040 ns/op >> >> # Kunpeng 920, intrinsic >> BigIntegers.testAdd avgt 25 21.791 ? 0.085 ns/op >> BigIntegers.testHugeToString avgt 25 415.209 ? 32.170 ns/op >> BigIntegers.testLargeToString avgt 25 124.635 ? 2.157 ns/op >> **BigIntegers.testLeftShift avgt 25 551.710 ? 7.836 ns/op** >> BigIntegers.testMultiply avgt 25 5869.401 ? 54.803 ns/op >> **BigIntegers.testRightShift avgt 25 186.896 ? 6.378 ns/op** >> BigIntegers.testSmallToString avgt 25 117.543 ? 3.036 ns/op > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > minor improvements for small BigIntegers Marked as reviewed by aph (Reviewer). src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4167: > 4165: __ strw(r12, __ post(newArr, 4)); > 4166: __ sub(numIter, numIter, 1); > 4167: __ cbz(numIter, Exit); This is odd code. Why not `cbnz(numIter, ShiftOneLoop)` ? ------------- PR: https://git.openjdk.java.net/jdk/pull/861 From xliu at openjdk.java.net Tue Oct 27 17:11:27 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 27 Oct 2020 17:11:27 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v2] In-Reply-To: References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: <-uWNzjVSMK3A2pbJURqukL7uAVMzr-oIs9PPMPP0Sw8=.7150d944-62ee-41c9-b234-b2a52ae331c9@github.com> On Wed, 21 Oct 2020 09:13:08 GMT, Nick Gasson wrote: >> When using the Linux "perf" tool to do system profiling, symbol names of >> running Java methods cannot be decoded, resulting in unhelpful output >> such as: >> >> 10.52% [JIT] tid 236748 [.] 0x00007f6fdb75d223 >> >> Perf can read a simple text file format describing the mapping between >> address ranges and symbol names for a particular process [1]. >> >> It's possible to generate this already for Java processes using a JVMTI >> plugin such as perf-map-agent [2]. However this requires compiling >> third-party code and then loading the agent into your Java process. It >> would be more convenient if Hotspot could write this file directly using >> a diagnostic command. The information required is almost identical to >> that of the existing Compiler.codelist command. >> >> This patch adds a Compiler.perfmap diagnostic command on Linux only. To >> use, first run "jcmd Compiler.perfmap" and then "perf top" or >> "perf record" and the report should show decoded Java symbol names for >> that process. >> >> As this just writes a snapshot of the code cache when the command is >> run, it will become stale if methods are compiled later or unloaded. >> However this shouldn't be a big problem in practice if the map file is >> generated after the application has warmed up. >> >> [1] https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt >> [2] https://github.com/jvm-profiling-tools/perf-map-agent > > Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: > > Update for review comments src/hotspot/share/code/codeCache.cpp line 1566: > 1564: > 1565: char fname[32]; > 1566: jio_snprintf(fname, sizeof(fname), "/tmp/perf-%d.map", os::current_process_id()); if you don't use #ifdef LINUX, this line needs a cross-platform temp. ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From xliu at openjdk.java.net Tue Oct 27 17:26:21 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 27 Oct 2020 17:26:21 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v2] In-Reply-To: <-uWNzjVSMK3A2pbJURqukL7uAVMzr-oIs9PPMPP0Sw8=.7150d944-62ee-41c9-b234-b2a52ae331c9@github.com> References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> <-uWNzjVSMK3A2pbJURqukL7uAVMzr-oIs9PPMPP0Sw8=.7150d944-62ee-41c9-b234-b2a52ae331c9@github.com> Message-ID: On Tue, 27 Oct 2020 17:08:18 GMT, Xin Liu wrote: >> Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Update for review comments > > src/hotspot/share/code/codeCache.cpp line 1566: > >> 1564: >> 1565: char fname[32]; >> 1566: jio_snprintf(fname, sizeof(fname), "/tmp/perf-%d.map", os::current_process_id()); > > if you don't use #ifdef LINUX, this line needs a cross-platform temp. I would like to take back this. I've seen that `CodeCache::write_perf_map` is wrapped by #ifdef LINUX. I read the older revision of change. ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From chagedorn at openjdk.java.net Tue Oct 27 17:40:21 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 27 Oct 2020 17:40:21 GMT Subject: RFR: 8255245: C1: Fix output of -XX:+PrintCFGToFile to open it with visualizer [v2] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 17:28:19 GMT, Vladimir Kozlov wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Move print_on definition to header file > > Good. @vnkozlov @neliasso @navyxliu Thank you for your reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/837 From kvn at openjdk.java.net Tue Oct 27 23:15:17 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 27 Oct 2020 23:15:17 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 07:34:57 GMT, Jie Fu wrote: > Hi all, > > Just as @jatin-bhateja pointed out [1], there are more instructs in x86.ad which should use legacy mode. > > It would be better to fix the following cases: > ------------------------ > 1. instruct mul2L_reg > The code-gen logic uses phaddd [2], which requires legacy mode here [3]. > This bug might be reproduced on AVX512 machines without avx512dq. > > 2. instruct vmul4L_reg_avx > The code-gen logic uses vphaddd [4], which requires legacy mode here [5]. > This bug might be reproduced on AVX512 machines without avx512dq. > > 3. instruct reductionL > For MulReductionVL, the code-gen chain can be: reduceL --> reduce4L --> reduce_operation_128 --> vpmullq [6] > vpmullq require legacy mode [7] if avx512dq isn't supported. > This bug might be reproduced on AVX512 machines without avx512dq. > > 4. instruct reductionB > For MinReductionV, the code-gen chain can be: reduceB --> reduce32B --> reduce_operation_128 --> pminsb [8] > pminsb require legacy mode [9] if avx512bw isn't supported. > This bug might be reproduced on AVX512 machines without avx512bw. > ------------------------ > Bugs in mul2L_reg/vmul4L_reg_avx/reductionL can be only reproduced on AVX512 machines without avx512dq. > And bug in reductionB can be only reproduced on AVX512 machines without avx512bw. > > Unfortunately, it's impossible for us to create reproducers since our AVX512 platforms support both avx512dq and avx512bw. > However, it do make sense to fix these unexposed bugs since vector api code will be sure to run on various paltforms (e.g., AVX512 machines without avx512dq/bw) in the future. > > The fix just changes vec to legVec, which is quite safe in theory. > > As for the reduction patterns of Float and Double, I don't see any reason that they should use legacy mode (maybe I've missed something). > > Testing: > - jdk/incubator/vector on both AVX512 and AVX256 machines > > Any comments? > > Thanks a lot. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/pull/791#commitcomment-43473733 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5472 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6217 > [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5497 > [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6165 > [6] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1521 > [7] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6428 > [8] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1482 > [9] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6475 Good. Thank you for cleaning this up. Please, someone in Oracle runs Mach5 testing with UseAVX=3. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/874 From kvn at openjdk.java.net Tue Oct 27 23:21:18 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 27 Oct 2020 23:21:18 GMT Subject: RFR: 8255441: Cleanup ciEnv/jvmciEnv::lookup_method-s In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 08:17:48 GMT, Aleksey Shipilev wrote: > Static analysis complains there is a potentially uninitialized `dest_method` after the switch, oblivious of `ShouldNotReachHere()`. This can be cleaned up along with related code. Can you use fatal() and print unexpected `bc` instead of SNRH()? ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/875 From iveresov at openjdk.java.net Tue Oct 27 23:21:23 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Tue, 27 Oct 2020 23:21:23 GMT Subject: RFR: 8255429: Remove C2-based profiling Message-ID: If there are no objections I'd like to remove some obsolete code that was an experiment to implement profiling in C2. It was added during the first tiered compilation experiments and is not unused. ------------- Commit messages: - Remove obsolete C2-based profiling. Changes: https://git.openjdk.java.net/jdk/pull/888/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=888&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255429 Stats: 469 lines in 9 files changed: 0 ins; 429 del; 40 mod Patch: https://git.openjdk.java.net/jdk/pull/888.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/888/head:pull/888 PR: https://git.openjdk.java.net/jdk/pull/888 From jiefu at openjdk.java.net Tue Oct 27 23:29:16 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 27 Oct 2020 23:29:16 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen In-Reply-To: References: Message-ID: <4QyFcn7tS3M8sj3ag3byA040Dki1f0lYEExCxch10eI=.91473c27-dd53-4e64-9c68-5775357e5dc0@github.com> On Tue, 27 Oct 2020 23:12:42 GMT, Vladimir Kozlov wrote: > Good. Thank you for cleaning this up. > Please, someone in Oracle runs Mach5 testing with UseAVX=3. Thanks @vnkozlov for your review. Hope experts from Intel (@sviswa7 , @jatin-bhateja , etc.) can also take a look at this. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/874 From kvn at openjdk.java.net Wed Oct 28 00:38:22 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 28 Oct 2020 00:38:22 GMT Subject: RFR: 8255429: Remove C2-based profiling In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 23:14:15 GMT, Igor Veresov wrote: > If there are no objections I'd like to remove some obsolete code that was an experiment to implement profiling in C2. It was added during the first tiered compilation experiments and is not unused. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/888 From ngasson at openjdk.java.net Wed Oct 28 02:13:17 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 28 Oct 2020 02:13:17 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v4] In-Reply-To: References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: On Tue, 27 Oct 2020 04:17:34 GMT, Nick Gasson wrote: >> Hi Nick, >> >> This looks good. >> Thank you for the update. >> >> Thanks, >> Serguei > >> I don't see any reason for this to be a product flag, rather than diagnostic. > > OK sure, I've made it a diagnostic flag. > @nick-arm only [Reviewers](https://openjdk.java.net/bylaws#reviewer) can determine that a CSR is not needed. @dholmes-ora would you mind helping to remove the csr label? I don't have permission to do it. ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From github.com+70893615+jasontatton-aws at openjdk.java.net Wed Oct 28 02:24:24 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Wed, 28 Oct 2020 02:24:24 GMT Subject: RFR: 8253101: Clean up CallStaticJavaNode EA flags Message-ID: <8RzONh6c2LiNek7pXUTsKwf_rubmlVMf6H8S6eAsyHA=.73264c8b-6d0d-41a5-9e96-b54d3a6358b9@github.com> Please review this small change to cleanup fields: _is_scalar_replaceable and _is_non_escaping from CallStaticJavaNode as well as code which assigns to those fields. The change was tested with run-test-tier[1-3] on Linux arm64 and x86-64. Thanks, Jason ------------- Commit messages: - 8253101 Changes: https://git.openjdk.java.net/jdk/pull/889/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=889&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253101 Stats: 21 lines in 2 files changed: 0 ins; 21 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/889.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/889/head:pull/889 PR: https://git.openjdk.java.net/jdk/pull/889 From kvn at openjdk.java.net Wed Oct 28 02:25:25 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 28 Oct 2020 02:25:25 GMT Subject: RFR: 8255466: C2 crashes at ciObject::get_oop() const+0x0 Message-ID: Graal testing hit this issue with product VM. Tom R. suggested that it could be the case of reflective unsafe static field access that would eventually be optimized away because the Class is null: `if (staticFieldBase != null) { return Unsafe.getInt(staticFieldBase, Unsafe.staticFieldOffset(field)); }` I suggest to replace assert with runtime check. Note, `o` value is assigned to `_const_oop` so semantically new code is the same except additional runtime check. I also noticed that const_oop is accessed without check for NULL in new Vector API code. I added check there too. Passed tier1-3 testing. ------------- Commit messages: - 8255466: C2 crashes at ciObject::get_oop() const+0x0 Changes: https://git.openjdk.java.net/jdk/pull/890/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=890&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255466 Stats: 9 lines in 2 files changed: 5 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/890.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/890/head:pull/890 PR: https://git.openjdk.java.net/jdk/pull/890 From github.com+10482586+erik1iu at openjdk.java.net Wed Oct 28 02:49:21 2020 From: github.com+10482586+erik1iu at openjdk.java.net (Eric Liu) Date: Wed, 28 Oct 2020 02:49:21 GMT Subject: RFR: 8255429: Remove C2-based profiling In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 00:35:45 GMT, Vladimir Kozlov wrote: >> If there are no objections I'd like to remove some obsolete code that was an experiment to implement profiling in C2. It was added during the first tiered compilation experiments and is not unused. > > Looks good. Do you mean that those profiling data never used before ? I noticed that those data was collected during abstract interpretation, is that `the first tiered compilation` you mean ? ------------- PR: https://git.openjdk.java.net/jdk/pull/888 From dongbo at openjdk.java.net Wed Oct 28 03:30:18 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 28 Oct 2020 03:30:18 GMT Subject: RFR: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic [v2] In-Reply-To: References: Message-ID: On Mon, 26 Oct 2020 10:40:27 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request incrementally with one additional commit since the last revision: >> >> minor improvements for small BigIntegers > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4167: > >> 4165: __ strw(r12, __ post(newArr, 4)); >> 4166: __ sub(numIter, numIter, 1); >> 4167: __ cbz(numIter, Exit); > > This is odd code. Why not `cbnz(numIter, ShiftOneLoop)` ? My bad, it should be cbnz(numIter, ShiftOneLoop). But it's gone now, the ShiftOneLoop is unrolled in the newest version. Do you think we need further modifications? ------------- PR: https://git.openjdk.java.net/jdk/pull/861 From xliu at openjdk.java.net Wed Oct 28 04:05:20 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 28 Oct 2020 04:05:20 GMT Subject: RFR: 8253101: Clean up CallStaticJavaNode EA flags In-Reply-To: <8RzONh6c2LiNek7pXUTsKwf_rubmlVMf6H8S6eAsyHA=.73264c8b-6d0d-41a5-9e96-b54d3a6358b9@github.com> References: <8RzONh6c2LiNek7pXUTsKwf_rubmlVMf6H8S6eAsyHA=.73264c8b-6d0d-41a5-9e96-b54d3a6358b9@github.com> Message-ID: <9X833c_Km1OTi2AbOP5dzEAezqg1KNh9QSzvae1FKDM=.80619e0b-e18a-41d0-a4ae-e4e5c65956e6@github.com> On Wed, 28 Oct 2020 02:16:43 GMT, Jason Tatton wrote: > Please review this small change to cleanup fields: _is_scalar_replaceable and _is_non_escaping from CallStaticJavaNode as well as code which assigns to those fields. > > The change was tested with run-test-tier[1-3] on Linux arm64 and x86-64. > > Thanks, > Jason the patch looks good to me, but we need a reviewer to approve it. ------------- PR: https://git.openjdk.java.net/jdk/pull/889 From kvn at openjdk.java.net Wed Oct 28 05:33:16 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 28 Oct 2020 05:33:16 GMT Subject: RFR: 8253101: Clean up CallStaticJavaNode EA flags In-Reply-To: <8RzONh6c2LiNek7pXUTsKwf_rubmlVMf6H8S6eAsyHA=.73264c8b-6d0d-41a5-9e96-b54d3a6358b9@github.com> References: <8RzONh6c2LiNek7pXUTsKwf_rubmlVMf6H8S6eAsyHA=.73264c8b-6d0d-41a5-9e96-b54d3a6358b9@github.com> Message-ID: On Wed, 28 Oct 2020 02:16:43 GMT, Jason Tatton wrote: > Please review this small change to cleanup fields: _is_scalar_replaceable and _is_non_escaping from CallStaticJavaNode as well as code which assigns to those fields. > > The change was tested with run-test-tier[1-3] on Linux arm64 and x86-64. > > Thanks, > Jason That was preparation for https://bugs.openjdk.java.net/browse/JDK-8012974 ------------- PR: https://git.openjdk.java.net/jdk/pull/889 From kvn at openjdk.java.net Wed Oct 28 05:37:16 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 28 Oct 2020 05:37:16 GMT Subject: RFR: 8253101: Clean up CallStaticJavaNode EA flags In-Reply-To: <8RzONh6c2LiNek7pXUTsKwf_rubmlVMf6H8S6eAsyHA=.73264c8b-6d0d-41a5-9e96-b54d3a6358b9@github.com> References: <8RzONh6c2LiNek7pXUTsKwf_rubmlVMf6H8S6eAsyHA=.73264c8b-6d0d-41a5-9e96-b54d3a6358b9@github.com> Message-ID: On Wed, 28 Oct 2020 02:16:43 GMT, Jason Tatton wrote: > Please review this small change to cleanup fields: _is_scalar_replaceable and _is_non_escaping from CallStaticJavaNode as well as code which assigns to those fields. > > The change was tested with run-test-tier[1-3] on Linux arm64 and x86-64. > > Thanks, > Jason The idea was to delay inlining of valueOf() after EA: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-May/010422.html ------------- PR: https://git.openjdk.java.net/jdk/pull/889 From jiefu at openjdk.java.net Wed Oct 28 06:11:17 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 28 Oct 2020 06:11:17 GMT Subject: RFR: 8255378: [Vector API] Remove redundant vector length check after JDK-8254814 and JDK-8255210 In-Reply-To: <4rggcT9dE4sNRGPJmnwR-vOmiPYjWK6_LqMyqGtJ2x0=.5b9b65cf-4fb2-492a-90a8-628761053b5e@github.com> References: <4rggcT9dE4sNRGPJmnwR-vOmiPYjWK6_LqMyqGtJ2x0=.5b9b65cf-4fb2-492a-90a8-628761053b5e@github.com> Message-ID: On Mon, 26 Oct 2020 16:57:19 GMT, Vladimir Ivanov wrote: > Looks good. Thanks @iwanowww for your review. May I get a second review for this change? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/854 From xxinliu at amazon.com Wed Oct 28 06:29:06 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Wed, 28 Oct 2020 06:29:06 +0000 Subject: RFR: 8253101: Clean up CallStaticJavaNode EA flags In-Reply-To: References: <8RzONh6c2LiNek7pXUTsKwf_rubmlVMf6H8S6eAsyHA=.73264c8b-6d0d-41a5-9e96-b54d3a6358b9@github.com> Message-ID: Vladimir, Actually, what you observe in JDK-8012974 doesn't limit to boxing objects. A non-escaping substring has very similar property. C2 can replace its field value: byte[] with its parent's value + offset. Maybe we can avoid from allocation and arraycopy in this way. More generally, all JavaCall nodes which return an OOP deserve to be parsed in EA(and inlined after EA). If an oop is non-escaped, any of its field may be potentially replaced by the equivalence values found in current CU, isn't it? What do you think about this direction? We have some experiments based on substrings. We would like to make this optimization as general as possible. Back to this PR, I think we can take it off this time because we can't finish the feature before jdk16. Another reason is CallStaticJava seems not the ideal Call to store such information. If we need to add it back, I think we can create a dedicated subclass of JavaCall for the general case. What do you think? Thanks, --lx ?On 10/27/20, 10:38 PM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On Wed, 28 Oct 2020 02:16:43 GMT, Jason Tatton wrote: > Please review this small change to cleanup fields: _is_scalar_replaceable and _is_non_escaping from CallStaticJavaNode as well as code which assigns to those fields. > > The change was tested with run-test-tier[1-3] on Linux arm64 and x86-64. > > Thanks, > Jason The idea was to delay inlining of valueOf() after EA: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-May/010422.html ------------- PR: https://git.openjdk.java.net/jdk/pull/889 From vlivanov at openjdk.java.net Wed Oct 28 07:20:18 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 28 Oct 2020 07:20:18 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen In-Reply-To: <4QyFcn7tS3M8sj3ag3byA040Dki1f0lYEExCxch10eI=.91473c27-dd53-4e64-9c68-5775357e5dc0@github.com> References: <4QyFcn7tS3M8sj3ag3byA040Dki1f0lYEExCxch10eI=.91473c27-dd53-4e64-9c68-5775357e5dc0@github.com> Message-ID: <3V2gwYUhczi10t6jQy90MZ4DewHv9wHCwQHH50_EjvU=.f74d682d-6f44-4718-ae6e-183bba961566@github.com> On Tue, 27 Oct 2020 23:26:58 GMT, Jie Fu wrote: >> Good. Thank you for cleaning this up. >> Please, someone in Oracle runs Mach5 testing with UseAVX=3. > >> Good. Thank you for cleaning this up. >> Please, someone in Oracle runs Mach5 testing with UseAVX=3. > > Thanks @vnkozlov for your review. > Hope experts from Intel (@sviswa7 , @jatin-bhateja , etc.) can also take a look at this. > Thanks. >From correctness perspective, the fix looks good. Xeon Phi CPU family doesn't support BW/DQ extensions. The only concern I have is that the fix completely disables the usage of the upper bank (16-31) registers for those operands irrespective of whether BW/DQ are present or not. It may lead to performance problems when vector register pressure is high. ------------- PR: https://git.openjdk.java.net/jdk/pull/874 From neliasso at openjdk.java.net Wed Oct 28 08:21:25 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 28 Oct 2020 08:21:25 GMT Subject: RFR: 8255429: Remove C2-based profiling In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 23:14:15 GMT, Igor Veresov wrote: > If there are no objections I'd like to remove some obsolete code that was an experiment to implement profiling in C2. It was added during the first tiered compilation experiments and is not unused. Nice clean up! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/888 From vlivanov at openjdk.java.net Wed Oct 28 08:31:21 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 28 Oct 2020 08:31:21 GMT Subject: RFR: 8255429: Remove C2-based profiling In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 23:14:15 GMT, Igor Veresov wrote: > If there are no objections I'd like to remove some obsolete code that was an experiment to implement profiling in C2. It was added during the first tiered compilation experiments and is not unused. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/888 From shade at openjdk.java.net Wed Oct 28 08:40:34 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 28 Oct 2020 08:40:34 GMT Subject: RFR: 8255441: Cleanup ciEnv/jvmciEnv::lookup_method-s [v2] In-Reply-To: References: Message-ID: > Static analysis complains there is a potentially uninitialized `dest_method` after the switch, oblivious of `ShouldNotReachHere()`. This can be cleaned up along with related code. Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8255441-cleanup-linkresolver - Use `fatal` with proper message - 8255441: Cleanup ciEnv/jvmciEnv::lookup_method-s ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/875/files - new: https://git.openjdk.java.net/jdk/pull/875/files/b9fe313c..fb800b60 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=875&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=875&range=00-01 Stats: 1878 lines in 74 files changed: 1371 ins; 310 del; 197 mod Patch: https://git.openjdk.java.net/jdk/pull/875.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/875/head:pull/875 PR: https://git.openjdk.java.net/jdk/pull/875 From shade at openjdk.java.net Wed Oct 28 08:40:34 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 28 Oct 2020 08:40:34 GMT Subject: RFR: 8255441: Cleanup ciEnv/jvmciEnv::lookup_method-s [v2] In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 23:18:42 GMT, Vladimir Kozlov wrote: > Can you use fatal() and print unexpected `bc` instead of SNRH()? Righto. See new version. ------------- PR: https://git.openjdk.java.net/jdk/pull/875 From vlivanov at openjdk.java.net Wed Oct 28 08:43:20 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 28 Oct 2020 08:43:20 GMT Subject: RFR: 8255466: C2 crashes at ciObject::get_oop() const+0x0 In-Reply-To: References: Message-ID: <7Zhni-ZjtEcJ4KJG3AawaJc-4I7q583Fy4d9-i97VC0=.e25f08b9-a42f-48a4-85e9-dace848e502b@github.com> On Wed, 28 Oct 2020 02:19:42 GMT, Vladimir Kozlov wrote: > Graal testing hit this issue with product VM. Tom R. suggested that it could be the case of reflective unsafe static field access that would eventually be optimized away because the Class is null: > `if (staticFieldBase != null) { > return Unsafe.getInt(staticFieldBase, Unsafe.staticFieldOffset(field)); > }` > > I suggest to replace assert with runtime check. Note, `o` value is assigned to `_const_oop` so semantically new code is the same except additional runtime check. > > I also noticed that const_oop is accessed without check for NULL in new Vector API code. I added check there too. > > Passed tier1-3 testing. It would be nice to have a regression test for it. Otherwise, looks good. src/hotspot/share/opto/type.cpp line 3047: > 3045: _is_ptr_to_narrowoop = false; > 3046: } else if (klass() == ciEnv::current()->Class_klass() && > 3047: _offset >= InstanceMirrorKlass::offset_of_static_fields()) { You could turn the assert into the check in the enclosing `if`. IMO it makes the code clearer. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/890 From vlivanov at openjdk.java.net Wed Oct 28 08:52:22 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 28 Oct 2020 08:52:22 GMT Subject: RFR: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X [v2] In-Reply-To: References: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> Message-ID: On Mon, 26 Oct 2020 17:40:40 GMT, Vladimir Kozlov wrote: >> To improve a chance to vectorize a loop, a special code in superword tries to hoist loads to the beginning of loop by replacing their memory input with corresponding (same memory slice) loop's memory Phi : >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L474 >> >> Originally loads are ordered by corresponding stores on the same memory slice. But when they are hoisted they loose that ordering - nothing enforce the order. >> In TestComplexAddrExpr.test6 case the ordering is preserved (luckily?) after hoisting only when vector size is 32 bytes (avx2) but they become unordered with 16 (avx=0 or avx1) or 64 (avx512) bytes vectors. >> >> The mystery of why the test did not fail in our teting environment is also solved! We have old Skylake machines (even my local machine) for which AVX is switched to avx2: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L721 >> >> I have simple fix (use original loads ordering indexes) but looking on the code which causing the issue I see that it is bogus/incomplete - it does not help cases listed for JDK-8076284 changes: >> >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017645.html >> >> Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. Even if pack_parallel() method is able created packs they are currently all removed by filter_packs() method. >> And additionally the above cases from JDK-8076284 are vectorized without hoisting loads and pack_parallel - I verified it. >> >> The code added by JDK-8076284 is useless now and I am excluding ti. It needs more work to be useful. I reluctant to remove the code because may be in a future we will have time to invest into it. >> >> There were 2 way to exclude it: add new field in superword class: >> >> bool _do_vector_loop; // whether to do vectorization/simd style >> + bool _do_vector_loop_experimental; // experimental optimization >> bool _do_reserve_copy; // do reserve copy of the graph(loop) before final modification in output >> or use `#if VECTOR_LOOP_SIMD` as in current changes. I prefer this one to avoid wasting compilation time. I used first one for testing by setting `_do_vector_loop_experimental(UseNewCode)`. >> >> Testing: tier1-7 with avx2 and avx512. >> Performance testing - no regrression. >> I also compared jtreg tests output with -XX:+TraceNewVectors to verify that number of created vector nodes did not change. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Use static const bool local variable src/hotspot/share/opto/superword.cpp line 94: > 92: } > 93: > 94: static const bool _do_vector_loop_experimental = false; // Experimental vectorization which uses data from loop unrolling. For such purposes, I find debug VM flags a better option: they are constant in product binaries, easy to experiment/change from command-line with debug binaries, and enumerated in a single place. Any reason to prefer `static const` declarations? ------------- PR: https://git.openjdk.java.net/jdk/pull/859 From chagedorn at openjdk.java.net Wed Oct 28 08:54:18 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 28 Oct 2020 08:54:18 GMT Subject: Integrated: 8255245: C1: Fix output of -XX:+PrintCFGToFile to open it with visualizer In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 13:16:27 GMT, Christian Hagedorn wrote: > [JDK-8251093](https://bugs.openjdk.java.net/browse/JDK-8251093) introduced some improved logging for intervals. When specifying `-XX:+PrintCFGToFile` to dump the graph to a file to later open it with the C1 visualizer, it also uses the improved interval printing. However, this output can no longer be read by the C1 Visualizer. As the C1 Visualizer is not part of the JDK, we should include the old format again for the output produced by `-XX:+PrintCFGToFile` to be compatible with the visualizer again. The console output can still use the improved logging of JDK-8251093. This pull request has now been integrated. Changeset: b7d483c7 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/b7d483c7 Stats: 38 lines in 3 files changed: 24 ins; 2 del; 12 mod 8255245: C1: Fix output of -XX:+PrintCFGToFile to open it with visualizer Reviewed-by: kvn, xliu ------------- PR: https://git.openjdk.java.net/jdk/pull/837 From iveresov at openjdk.java.net Wed Oct 28 09:13:21 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 28 Oct 2020 09:13:21 GMT Subject: RFR: 8255429: Remove C2-based profiling In-Reply-To: References: Message-ID: <82uOEfM-UO_6x35J5HKWswod9mZp4KjB1a9yZiJnYT4=.049476ab-9d66-4621-9a13-2c314f715b7a@github.com> On Wed, 28 Oct 2020 08:28:50 GMT, Vladimir Ivanov wrote: >> If there are no objections I'd like to remove some obsolete code that was an experiment to implement profiling in C2. It was added during the first tiered compilation experiments and is not unused. > > Looks good. > Do you mean that those profiling data never used before ? I noticed that those data was collected during abstract interpretation, is that `the first tiered compilation` you mean ? No, it is code from >10 years ago that aimed to implement tiered compilation using c2 only. It generates code semantically equivalent to level 2 and 3. It is currently done by c1, so we don't need the c2 version. ------------- PR: https://git.openjdk.java.net/jdk/pull/888 From jiefu at openjdk.java.net Wed Oct 28 09:24:30 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 28 Oct 2020 09:24:30 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen [v2] In-Reply-To: References: Message-ID: > Hi all, > > Just as @jatin-bhateja pointed out [1], there are more instructs in x86.ad which should use legacy mode. > > It would be better to fix the following cases: > ------------------------ > 1. instruct mul2L_reg > The code-gen logic uses phaddd [2], which requires legacy mode here [3]. > This bug might be reproduced on AVX512 machines without avx512dq. > > 2. instruct vmul4L_reg_avx > The code-gen logic uses vphaddd [4], which requires legacy mode here [5]. > This bug might be reproduced on AVX512 machines without avx512dq. > > 3. instruct reductionL > For MulReductionVL, the code-gen chain can be: reduceL --> reduce4L --> reduce_operation_128 --> vpmullq [6] > vpmullq require legacy mode [7] if avx512dq isn't supported. > This bug might be reproduced on AVX512 machines without avx512dq. > > 4. instruct reductionB > For MinReductionV, the code-gen chain can be: reduceB --> reduce32B --> reduce_operation_128 --> pminsb [8] > pminsb require legacy mode [9] if avx512bw isn't supported. > This bug might be reproduced on AVX512 machines without avx512bw. > ------------------------ > Bugs in mul2L_reg/vmul4L_reg_avx/reductionL can be only reproduced on AVX512 machines without avx512dq. > And bug in reductionB can be only reproduced on AVX512 machines without avx512bw. > > Unfortunately, it's impossible for us to create reproducers since our AVX512 platforms support both avx512dq and avx512bw. > However, it do make sense to fix these unexposed bugs since vector api code will be sure to run on various paltforms (e.g., AVX512 machines without avx512dq/bw) in the future. > > The fix just changes vec to legVec, which is quite safe in theory. > > As for the reduction patterns of Float and Double, I don't see any reason that they should use legacy mode (maybe I've missed something). > > Testing: > - jdk/incubator/vector on both AVX512 and AVX256 machines > > Any comments? > > Thanks a lot. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/pull/791#commitcomment-43473733 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5472 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6217 > [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5497 > [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6165 > [6] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1521 > [7] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6428 > [8] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1482 > [9] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6475 Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Add reductionL_avx512dq and reductionB_avx512bw ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/874/files - new: https://git.openjdk.java.net/jdk/pull/874/files/59ece711..a16bddf2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=874&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=874&range=00-01 Stats: 39 lines in 1 file changed: 37 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/874.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/874/head:pull/874 PR: https://git.openjdk.java.net/jdk/pull/874 From jiefu at openjdk.java.net Wed Oct 28 09:24:31 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 28 Oct 2020 09:24:31 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen [v2] In-Reply-To: <3V2gwYUhczi10t6jQy90MZ4DewHv9wHCwQHH50_EjvU=.f74d682d-6f44-4718-ae6e-183bba961566@github.com> References: <4QyFcn7tS3M8sj3ag3byA040Dki1f0lYEExCxch10eI=.91473c27-dd53-4e64-9c68-5775357e5dc0@github.com> <3V2gwYUhczi10t6jQy90MZ4DewHv9wHCwQHH50_EjvU=.f74d682d-6f44-4718-ae6e-183bba961566@github.com> Message-ID: On Wed, 28 Oct 2020 07:17:23 GMT, Vladimir Ivanov wrote: > From correctness perspective, the fix looks good. > Xeon Phi CPU family doesn't support BW/DQ extensions. > > The only concern I have is that the fix completely disables the usage of the upper bank (16-31) registers for those operands irrespective of whether BW/DQ are present or not. It may lead to performance problems when vector register pressure is high. Thanks @iwanowww for your review. reductionL_avx512dq and reductionB_avx512bw have been added for your concerns. Any comments? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/874 From vlivanov at openjdk.java.net Wed Oct 28 10:01:20 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 28 Oct 2020 10:01:20 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen [v2] In-Reply-To: References: Message-ID: <2nKXVLEcOmivYE-7Cg4f3tHW2zICNMb0gLM8jXbSjsA=.f12e2ae1-2598-4d7d-b0c2-50e1c91afa4f@github.com> On Wed, 28 Oct 2020 09:24:30 GMT, Jie Fu wrote: >> Hi all, >> >> Just as @jatin-bhateja pointed out [1], there are more instructs in x86.ad which should use legacy mode. >> >> It would be better to fix the following cases: >> ------------------------ >> 1. instruct mul2L_reg >> The code-gen logic uses phaddd [2], which requires legacy mode here [3]. >> This bug might be reproduced on AVX512 machines without avx512dq. >> >> 2. instruct vmul4L_reg_avx >> The code-gen logic uses vphaddd [4], which requires legacy mode here [5]. >> This bug might be reproduced on AVX512 machines without avx512dq. >> >> 3. instruct reductionL >> For MulReductionVL, the code-gen chain can be: reduceL --> reduce4L --> reduce_operation_128 --> vpmullq [6] >> vpmullq require legacy mode [7] if avx512dq isn't supported. >> This bug might be reproduced on AVX512 machines without avx512dq. >> >> 4. instruct reductionB >> For MinReductionV, the code-gen chain can be: reduceB --> reduce32B --> reduce_operation_128 --> pminsb [8] >> pminsb require legacy mode [9] if avx512bw isn't supported. >> This bug might be reproduced on AVX512 machines without avx512bw. >> ------------------------ >> Bugs in mul2L_reg/vmul4L_reg_avx/reductionL can be only reproduced on AVX512 machines without avx512dq. >> And bug in reductionB can be only reproduced on AVX512 machines without avx512bw. >> >> Unfortunately, it's impossible for us to create reproducers since our AVX512 platforms support both avx512dq and avx512bw. >> However, it do make sense to fix these unexposed bugs since vector api code will be sure to run on various paltforms (e.g., AVX512 machines without avx512dq/bw) in the future. >> >> The fix just changes vec to legVec, which is quite safe in theory. >> >> As for the reduction patterns of Float and Double, I don't see any reason that they should use legacy mode (maybe I've missed something). >> >> Testing: >> - jdk/incubator/vector on both AVX512 and AVX256 machines >> >> Any comments? >> >> Thanks a lot. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/pull/791#commitcomment-43473733 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5472 >> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6217 >> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5497 >> [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6165 >> [6] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1521 >> [7] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6428 >> [8] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1482 >> [9] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6475 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Add reductionL_avx512dq and reductionB_avx512bw Looks good. At some point, I thought about introducing new flavors of generic vector operands which would capture the dependency between legacy vectors and BW/DQ (by relying on dynamic register classes and dispatch between legacy and full-range register maks depending on the presence of BW and DQ respectively), but haven't had a chance to experiment with it. The main motivation was (and still is) to reduce redundant AD instructions which are kept solely to reify legacy register constraint. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/874 From github.com+10482586+erik1iu at openjdk.java.net Wed Oct 28 10:04:19 2020 From: github.com+10482586+erik1iu at openjdk.java.net (Eric Liu) Date: Wed, 28 Oct 2020 10:04:19 GMT Subject: RFR: 8255429: Remove C2-based profiling In-Reply-To: <82uOEfM-UO_6x35J5HKWswod9mZp4KjB1a9yZiJnYT4=.049476ab-9d66-4621-9a13-2c314f715b7a@github.com> References: <82uOEfM-UO_6x35J5HKWswod9mZp4KjB1a9yZiJnYT4=.049476ab-9d66-4621-9a13-2c314f715b7a@github.com> Message-ID: <8WYlY73vQmRRhYyUAfca0mcK5Em-f02d0fBIGHDsCqE=.e1fb1c27-1501-4919-b939-6a3f4fbd5a3e@github.com> On Wed, 28 Oct 2020 09:11:00 GMT, Igor Veresov wrote: >> Looks good. > >> Do you mean that those profiling data never used before ? I noticed that those data was collected during abstract interpretation, is that `the first tiered compilation` you mean ? > > No, it is code from >10 years ago that aimed to implement tiered compilation using c2 only. It generates code semantically equivalent to level 2 and 3. It is currently done by c1, so we don't need the c2 version. > > And that's not abstract interpretation, it is a runtime profile collection analogous to what the interpreter does. @veresov Thanks, this makes more sense to me now. I also wonder the profiling in interpreter, will it be deprecated ? ------------- PR: https://git.openjdk.java.net/jdk/pull/888 From adinn at openjdk.java.net Wed Oct 28 10:10:17 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Wed, 28 Oct 2020 10:10:17 GMT Subject: RFR: 8255378: [Vector API] Remove redundant vector length check after JDK-8254814 and JDK-8255210 In-Reply-To: References: Message-ID: On Sun, 25 Oct 2020 08:33:26 GMT, Jie Fu wrote: > Hi all, > > As @iwanowww pointed out [1] that there are redundant vector length checks for reductionI [2] and reductionS [3]. > It would be better to remove them. > > Testing: > - jdk/incubator/vector on both AVX512 and AVX256 machines > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/pull/791#discussion_r510687005 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L4429 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L4625 This is arguably trivial and needs no review but here is one anyway. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/854 From jiefu at openjdk.java.net Wed Oct 28 11:36:42 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 28 Oct 2020 11:36:42 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen [v2] In-Reply-To: <2nKXVLEcOmivYE-7Cg4f3tHW2zICNMb0gLM8jXbSjsA=.f12e2ae1-2598-4d7d-b0c2-50e1c91afa4f@github.com> References: <2nKXVLEcOmivYE-7Cg4f3tHW2zICNMb0gLM8jXbSjsA=.f12e2ae1-2598-4d7d-b0c2-50e1c91afa4f@github.com> Message-ID: <8bIqqphYLI1b0wqmwLtS7JxGwMy9AwG82y2mDVSRAlk=.d83910d3-2d5a-4471-9842-144b5399b13e@github.com> On Wed, 28 Oct 2020 09:58:34 GMT, Vladimir Ivanov wrote: > Looks good. > > At some point, I thought about introducing new flavors of generic vector operands which would capture the dependency between legacy vectors and BW/DQ (by relying on dynamic register classes and dispatch between legacy and full-range register maks depending on the presence of BW and DQ respectively), but haven't had a chance to experiment with it. The main motivation was (and still is) to reduce redundant AD instructions which are kept solely to reify legacy register constraint. Sounds great! Thanks @iwanowww . ------------- PR: https://git.openjdk.java.net/jdk/pull/874 From dongbo at openjdk.java.net Wed Oct 28 11:54:46 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 28 Oct 2020 11:54:46 GMT Subject: RFR: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic [v2] In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 16:45:42 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request incrementally with one additional commit since the last revision: >> >> minor improvements for small BigIntegers > > Marked as reviewed by aph (Reviewer). @theRealAph Thanks for the review. @RealFYang Could you please sponsor this? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/861 From dongbo at openjdk.java.net Wed Oct 28 11:54:47 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 28 Oct 2020 11:54:47 GMT Subject: Integrated: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic In-Reply-To: References: Message-ID: On Mon, 26 Oct 2020 09:19:45 GMT, Dong Bo wrote: > BigInteger.shiftRightImplWorker and BigInteger.shiftLeftImplWorker are not intrinsified on aarch64, which have been done on x86_64. > We can implement them via USHL NEON instruction (register), which handles four integers one time at most, against just integer C2 asm-code processed. > The usage of USHL can be found at: https://developer.arm.com/documentation/dui0801/g/A64-SIMD-Vector-Instructions/USHL--vector-?lang=en > > Patch passed jtreg tier1-3 tests on our aarch64 server. > Tests in test/jdk/java/math/BigInteger/* runned specially for the correctness of the implementation and passed. > > We tested test/micro/org/openjdk/bench/java/math/BigIntegers.java for performance gain on Kunpeng916 and Kunpeng920. > The following performance improvements were seen with this implementation: > - Intrinsification of BigInteger.shiftLeft: 25.52% (Kunpeng916), 37.56% (Kunpeng920) > - Intrinsification of BigInteger.shiftRight: 46.45% (Kunpeng916), 43.32% (Kunpeng920) > > The BigIntegers.java JMH micro-benchmark results: > Benchmark Mode Cnt Score Error Units > > # Kunpeng 916, default > BigIntegers.testAdd avgt 25 33.554 ? 0.224 ns/op > BigIntegers.testHugeToString avgt 25 575.554 ? 40.656 ns/op > BigIntegers.testLargeToString avgt 25 190.098 ? 0.825 ns/op > **BigIntegers.testLeftShift avgt 25 1495.779 ? 12.365 ns/op** > BigIntegers.testMultiply avgt 25 7551.707 ? 39.309 ns/op > **BigIntegers.testRightShift avgt 25 605.302 ? 6.710 ns/op** > BigIntegers.testSmallToString avgt 25 179.034 ? 0.873 ns/op > > # Kunpeng 916, intrinsic: > BigIntegers.testAdd avgt 25 33.531 ? 0.222 ns/op > BigIntegers.testHugeToString avgt 25 578.038 ? 40.675 ns/op > BigIntegers.testLargeToString avgt 25 188.566 ? 0.855 ns/op > **BigIntegers.testLeftShift avgt 25 1191.651 ? 20.136 ns/op** > BigIntegers.testMultiply avgt 25 7492.711 ? 3.702 ns/op > **BigIntegers.testRightShift avgt 25 326.891 ? 6.033 ns/op** > BigIntegers.testSmallToString avgt 25 178.267 ? 1.501 ns/op > > # Kunpeng 920, default > BigIntegers.testAdd avgt 25 22.790 ? 0.167 ns/op > BigIntegers.testHugeToString avgt 25 432.428 ? 10.736 ns/op > BigIntegers.testLargeToString avgt 25 121.899 ? 3.356 ns/op > **BigIntegers.testLeftShift avgt 25 883.530 ? 53.714 ns/op** > BigIntegers.testMultiply avgt 25 5918.845 ? 94.937 ns/op > **BigIntegers.testRightShift avgt 25 329.762 ? 15.850 ns/op** > BigIntegers.testSmallToString avgt 25 117.460 ? 3.040 ns/op > > # Kunpeng 920, intrinsic > BigIntegers.testAdd avgt 25 21.791 ? 0.085 ns/op > BigIntegers.testHugeToString avgt 25 415.209 ? 32.170 ns/op > BigIntegers.testLargeToString avgt 25 124.635 ? 2.157 ns/op > **BigIntegers.testLeftShift avgt 25 551.710 ? 7.836 ns/op** > BigIntegers.testMultiply avgt 25 5869.401 ? 54.803 ns/op > **BigIntegers.testRightShift avgt 25 186.896 ? 6.378 ns/op** > BigIntegers.testSmallToString avgt 25 117.543 ? 3.036 ns/op This pull request has now been integrated. Changeset: 6b2d11ba Author: Dong Bo Committer: Fei Yang URL: https://git.openjdk.java.net/jdk/commit/6b2d11ba Stats: 275 lines in 3 files changed: 273 ins; 0 del; 2 mod 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic Reviewed-by: aph ------------- PR: https://git.openjdk.java.net/jdk/pull/861 From jiefu at openjdk.java.net Wed Oct 28 14:02:45 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 28 Oct 2020 14:02:45 GMT Subject: Integrated: 8255378: [Vector API] Remove redundant vector length check after JDK-8254814 and JDK-8255210 In-Reply-To: References: Message-ID: On Sun, 25 Oct 2020 08:33:26 GMT, Jie Fu wrote: > Hi all, > > As @iwanowww pointed out [1] that there are redundant vector length checks for reductionI [2] and reductionS [3]. > It would be better to remove them. > > Testing: > - jdk/incubator/vector on both AVX512 and AVX256 machines > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/pull/791#discussion_r510687005 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L4429 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L4625 This pull request has now been integrated. Changeset: 591e7e2c Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/591e7e2c Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod 8255378: [Vector API] Remove redundant vector length check after JDK-8254814 and JDK-8255210 Reviewed-by: vlivanov, adinn ------------- PR: https://git.openjdk.java.net/jdk/pull/854 From jiefu at openjdk.java.net Wed Oct 28 14:02:44 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 28 Oct 2020 14:02:44 GMT Subject: RFR: 8255378: [Vector API] Remove redundant vector length check after JDK-8254814 and JDK-8255210 In-Reply-To: References: Message-ID: <0W7K7cGeaA1l-H8VWMxm_ZmcaWmaOm1jyecEoId-YH0=.78832a42-c4fe-4a28-83a9-0278b7fadc73@github.com> On Wed, 28 Oct 2020 10:07:50 GMT, Andrew Dinn wrote: > This is arguably trivial and needs no review but here is one anyway. Thanks @adinn for your review. Pushed. ------------- PR: https://git.openjdk.java.net/jdk/pull/854 From martin.doerr at sap.com Wed Oct 28 14:05:50 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 28 Oct 2020 14:05:50 +0000 Subject: C2 crash in PhaseIdealLoop::build_loop_late_post_work In-Reply-To: <09aa2715-0777-e69b-298f-17700846d348@oracle.com> References: <09aa2715-0777-e69b-298f-17700846d348@oracle.com> Message-ID: Hi Christian, JDK-8251994 doesn't resolve this issue. The crash has occurred after it was fixed. Seems like replay doesn't help, because we'd need a generated Scala class: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection Error while parsing line 6957: Can't find holder klass Debugging with gdb and core file is not easy, either. Best regards, Martin > -----Original Message----- > From: Christian Hagedorn > Sent: Dienstag, 27. Oktober 2020 16:31 > To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > Subject: Re: C2 crash in PhaseIdealLoop::build_loop_late_post_work > > Hi Martin > > Thanks for the pointer! I will have a look at that. > > Best regards, > Christian > > On 27.10.20 16:28, Doerr, Martin wrote: > > Hi Christian, > > > > thanks for your reply. > > > > Interesting. Another issue related to JDK-8249749 was recently fixed: > > 8251994: VM crashed running TestComplexAddrExpr.java test with - > XX:UseAVX=X > > https://bugs.openjdk.java.net/browse/JDK-8251994 > > Maybe it helps for this, too. > > > > I don?t have a fast reproducer atm. > > > > Best regards, > > Martin > > > > > >> -----Original Message----- > >> From: Christian Hagedorn > >> Sent: Dienstag, 27. Oktober 2020 13:03 > >> To: Doerr, Martin ; 'hotspot-compiler- > >> dev at openjdk.java.net' > >> Subject: Re: C2 crash in PhaseIdealLoop::build_loop_late_post_work > >> > >> Hi Martin > >> > >> Yes, this is a known problem (see JDK-8251925). We also see it failing > >> with only this benchmark, often in log-regression. I played around with > >> different settings by just running log-regression, for example, which > >> did not seem to trigger it, though. Do you have a fast reproducer? > >> > >> It started to fail after JDK-8249749 which, however, just seem to have > >> revealed the problem. > >> > >> Best regards, > >> Christian > >> > >> On 27.10.20 12:07, Doerr, Martin wrote: > >>> Hi, > >>> > >>> we observe C2 crashes on all x86_64 platforms when running > Renaissance > >> Benchmark "log-regression" since 2020-08-10. > >>> (hg was at "8250848: [aarch64] nativeGotJump_at() missing call to > verify()." > >> when we've seen it first.) > >>> > >>> > >>> # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at > >> pc=0x00007ffd594e9322, pid=1439804, tid=1516348 > >>> > >>> # > >>> > >>> # JRE version: OpenJDK Runtime Environment (16.0.0.1) (build 16.0.0.1- > >> internal+0-adhoc.openjdk.jdk-dev) > >>> > >>> # Java VM: OpenJDK 64-Bit Server VM (16.0.0.1-internal+0- > >> adhoc.openjdk.jdk-dev, mixed mode, sharing, tiered, compressed oops, > g1 > >> gc, windows-amd64) > >>> > >>> # Problematic frame: > >>> > >>> # V [jvm.dll+0x559322] > >> PhaseIdealLoop::build_loop_late_post_work+0x212 > >>> > >>> > >>> > >>> Host: Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz, 8 cores, 15G, > Windows > >> Server 2016 , 64 bit Build 14393 (10.0.14393.3630) > >>> > >>> Time: Sat Oct 24 19:40:57 2020 W. Europe Daylight Time elapsed time: > >> 16.016253 seconds (0d 0h 0m 16s) > >>> > >>> > >>> V [jvm.dll+0x559322] > PhaseIdealLoop::build_loop_late_post_work+0x212 > >> (loopnode.cpp:5312) > >>> > >>> V [jvm.dll+0x5590b4] PhaseIdealLoop::build_loop_late+0x214 > >> (loopnode.cpp:5159) > >>> > >>> V [jvm.dll+0x558418] PhaseIdealLoop::build_and_optimize+0x7b8 > >> (loopnode.cpp:3874) > >>> > >>> V [jvm.dll+0x21410a] Compile::optimize_loops+0x15a > (compile.cpp:1960) > >>> > >>> V [jvm.dll+0x20d900] Compile::Optimize+0xf90 (compile.cpp:2184) > >>> > >>> V [jvm.dll+0x20b105] Compile::Compile+0xba5 (compile.cpp:735) > >>> > >>> V [jvm.dll+0x19de6b] C2Compiler::compile_method+0xab > >> (c2compiler.cpp:104) > >>> > >>> Is this a known problem? > >>> > >>> Best regards, > >>> Martin > >>> From christian.hagedorn at oracle.com Wed Oct 28 14:59:03 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 28 Oct 2020 15:59:03 +0100 Subject: C2 crash in PhaseIdealLoop::build_loop_late_post_work In-Reply-To: References: <09aa2715-0777-e69b-298f-17700846d348@oracle.com> Message-ID: <727ac092-2345-804f-82ce-23ac8db52ca1@oracle.com> Hi Martin Yes that's unfortunate. I was able to reproduce it now by only running the benchmark dec-tree which also leads to the same assertion failure. Best regards, Christian On 28.10.20 15:05, Doerr, Martin wrote: > Hi Christian, > > JDK-8251994 doesn't resolve this issue. The crash has occurred after it was fixed. > > Seems like replay doesn't help, because we'd need a generated Scala class: > java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection > Error while parsing line 6957: Can't find holder klass > > Debugging with gdb and core file is not easy, either. > > Best regards, > Martin > > >> -----Original Message----- >> From: Christian Hagedorn >> Sent: Dienstag, 27. Oktober 2020 16:31 >> To: Doerr, Martin ; 'hotspot-compiler- >> dev at openjdk.java.net' >> Subject: Re: C2 crash in PhaseIdealLoop::build_loop_late_post_work >> >> Hi Martin >> >> Thanks for the pointer! I will have a look at that. >> >> Best regards, >> Christian >> >> On 27.10.20 16:28, Doerr, Martin wrote: >>> Hi Christian, >>> >>> thanks for your reply. >>> >>> Interesting. Another issue related to JDK-8249749 was recently fixed: >>> 8251994: VM crashed running TestComplexAddrExpr.java test with - >> XX:UseAVX=X >>> https://bugs.openjdk.java.net/browse/JDK-8251994 >>> Maybe it helps for this, too. >>> >>> I don?t have a fast reproducer atm. >>> >>> Best regards, >>> Martin >>> >>> >>>> -----Original Message----- >>>> From: Christian Hagedorn >>>> Sent: Dienstag, 27. Oktober 2020 13:03 >>>> To: Doerr, Martin ; 'hotspot-compiler- >>>> dev at openjdk.java.net' >>>> Subject: Re: C2 crash in PhaseIdealLoop::build_loop_late_post_work >>>> >>>> Hi Martin >>>> >>>> Yes, this is a known problem (see JDK-8251925). We also see it failing >>>> with only this benchmark, often in log-regression. I played around with >>>> different settings by just running log-regression, for example, which >>>> did not seem to trigger it, though. Do you have a fast reproducer? >>>> >>>> It started to fail after JDK-8249749 which, however, just seem to have >>>> revealed the problem. >>>> >>>> Best regards, >>>> Christian >>>> >>>> On 27.10.20 12:07, Doerr, Martin wrote: >>>>> Hi, >>>>> >>>>> we observe C2 crashes on all x86_64 platforms when running >> Renaissance >>>> Benchmark "log-regression" since 2020-08-10. >>>>> (hg was at "8250848: [aarch64] nativeGotJump_at() missing call to >> verify()." >>>> when we've seen it first.) >>>>> >>>>> >>>>> # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at >>>> pc=0x00007ffd594e9322, pid=1439804, tid=1516348 >>>>> >>>>> # >>>>> >>>>> # JRE version: OpenJDK Runtime Environment (16.0.0.1) (build 16.0.0.1- >>>> internal+0-adhoc.openjdk.jdk-dev) >>>>> >>>>> # Java VM: OpenJDK 64-Bit Server VM (16.0.0.1-internal+0- >>>> adhoc.openjdk.jdk-dev, mixed mode, sharing, tiered, compressed oops, >> g1 >>>> gc, windows-amd64) >>>>> >>>>> # Problematic frame: >>>>> >>>>> # V [jvm.dll+0x559322] >>>> PhaseIdealLoop::build_loop_late_post_work+0x212 >>>>> >>>>> >>>>> >>>>> Host: Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz, 8 cores, 15G, >> Windows >>>> Server 2016 , 64 bit Build 14393 (10.0.14393.3630) >>>>> >>>>> Time: Sat Oct 24 19:40:57 2020 W. Europe Daylight Time elapsed time: >>>> 16.016253 seconds (0d 0h 0m 16s) >>>>> >>>>> >>>>> V [jvm.dll+0x559322] >> PhaseIdealLoop::build_loop_late_post_work+0x212 >>>> (loopnode.cpp:5312) >>>>> >>>>> V [jvm.dll+0x5590b4] PhaseIdealLoop::build_loop_late+0x214 >>>> (loopnode.cpp:5159) >>>>> >>>>> V [jvm.dll+0x558418] PhaseIdealLoop::build_and_optimize+0x7b8 >>>> (loopnode.cpp:3874) >>>>> >>>>> V [jvm.dll+0x21410a] Compile::optimize_loops+0x15a >> (compile.cpp:1960) >>>>> >>>>> V [jvm.dll+0x20d900] Compile::Optimize+0xf90 (compile.cpp:2184) >>>>> >>>>> V [jvm.dll+0x20b105] Compile::Compile+0xba5 (compile.cpp:735) >>>>> >>>>> V [jvm.dll+0x19de6b] C2Compiler::compile_method+0xab >>>> (c2compiler.cpp:104) >>>>> >>>>> Is this a known problem? >>>>> >>>>> Best regards, >>>>> Martin >>>>> From jiefu at openjdk.java.net Wed Oct 28 15:55:44 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 28 Oct 2020 15:55:44 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen [v2] In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 23:12:42 GMT, Vladimir Kozlov wrote: >> Jie Fu has updated the pull request incrementally with one additional commit since the last revision: >> >> Add reductionL_avx512dq and reductionB_avx512bw > > Good. Thank you for cleaning this up. > Please, someone in Oracle runs Mach5 testing with UseAVX=3. Hi @vnkozlov , Are you still OK with the updated fix? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/874 From azeemj at openjdk.java.net Wed Oct 28 16:07:55 2020 From: azeemj at openjdk.java.net (Azeem Jiva) Date: Wed, 28 Oct 2020 16:07:55 GMT Subject: RFR: 8241495: Make more compiler related flags available on a per method level [v3] In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 03:22:29 GMT, Xin Liu wrote: >> 8241495: Make more compiler related flags available on a per method level > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8241495: Make more compiler related flags available on a per method level > > rollback others but only leave breakAtCompile and breakAtExecution. Marked as reviewed by azeemj (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/796 From azeemj at openjdk.java.net Wed Oct 28 16:09:46 2020 From: azeemj at openjdk.java.net (Azeem Jiva) Date: Wed, 28 Oct 2020 16:09:46 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen [v2] In-Reply-To: References: Message-ID: <101ZJDSYRdAkYE7f7J42Pulappq4z5SMYmDS0QIoGeQ=.5690608e-4279-487b-ac62-f74d1d3dfc64@github.com> On Wed, 28 Oct 2020 09:24:30 GMT, Jie Fu wrote: >> Hi all, >> >> Just as @jatin-bhateja pointed out [1], there are more instructs in x86.ad which should use legacy mode. >> >> It would be better to fix the following cases: >> ------------------------ >> 1. instruct mul2L_reg >> The code-gen logic uses phaddd [2], which requires legacy mode here [3]. >> This bug might be reproduced on AVX512 machines without avx512dq. >> >> 2. instruct vmul4L_reg_avx >> The code-gen logic uses vphaddd [4], which requires legacy mode here [5]. >> This bug might be reproduced on AVX512 machines without avx512dq. >> >> 3. instruct reductionL >> For MulReductionVL, the code-gen chain can be: reduceL --> reduce4L --> reduce_operation_128 --> vpmullq [6] >> vpmullq require legacy mode [7] if avx512dq isn't supported. >> This bug might be reproduced on AVX512 machines without avx512dq. >> >> 4. instruct reductionB >> For MinReductionV, the code-gen chain can be: reduceB --> reduce32B --> reduce_operation_128 --> pminsb [8] >> pminsb require legacy mode [9] if avx512bw isn't supported. >> This bug might be reproduced on AVX512 machines without avx512bw. >> ------------------------ >> Bugs in mul2L_reg/vmul4L_reg_avx/reductionL can be only reproduced on AVX512 machines without avx512dq. >> And bug in reductionB can be only reproduced on AVX512 machines without avx512bw. >> >> Unfortunately, it's impossible for us to create reproducers since our AVX512 platforms support both avx512dq and avx512bw. >> However, it do make sense to fix these unexposed bugs since vector api code will be sure to run on various paltforms (e.g., AVX512 machines without avx512dq/bw) in the future. >> >> The fix just changes vec to legVec, which is quite safe in theory. >> >> As for the reduction patterns of Float and Double, I don't see any reason that they should use legacy mode (maybe I've missed something). >> >> Testing: >> - jdk/incubator/vector on both AVX512 and AVX256 machines >> >> Any comments? >> >> Thanks a lot. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/pull/791#commitcomment-43473733 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5472 >> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6217 >> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5497 >> [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6165 >> [6] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1521 >> [7] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6428 >> [8] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1482 >> [9] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6475 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Add reductionL_avx512dq and reductionB_avx512bw Marked as reviewed by azeemj (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/874 From iveresov at openjdk.java.net Wed Oct 28 16:27:45 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 28 Oct 2020 16:27:45 GMT Subject: RFR: 8255429: Remove C2-based profiling In-Reply-To: <82uOEfM-UO_6x35J5HKWswod9mZp4KjB1a9yZiJnYT4=.049476ab-9d66-4621-9a13-2c314f715b7a@github.com> References: <82uOEfM-UO_6x35J5HKWswod9mZp4KjB1a9yZiJnYT4=.049476ab-9d66-4621-9a13-2c314f715b7a@github.com> Message-ID: <2UNFpRLhN_8LGBDvXS2Yrt_imyedfkz4MQ7PVd26BJI=.30c32a4a-0186-4910-9ec8-6e0d60d16979@github.com> On Wed, 28 Oct 2020 09:11:00 GMT, Igor Veresov wrote: >> Looks good. > >> Do you mean that those profiling data never used before ? I noticed that those data was collected during abstract interpretation, is that `the first tiered compilation` you mean ? > > No, it is code from >10 years ago that aimed to implement tiered compilation using c2 only. It generates code semantically equivalent to level 2 and 3. It is currently done by c1, so we don't need the c2 version. > > And that's not abstract interpretation, it is a runtime profile collection analogous to what the interpreter does. > @veresov Thanks, this makes more sense to me now. I also wonder the profiling in interpreter, will it be deprecated ? No, definitely not. ------------- PR: https://git.openjdk.java.net/jdk/pull/888 From iveresov at openjdk.java.net Wed Oct 28 16:27:47 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 28 Oct 2020 16:27:47 GMT Subject: Integrated: 8255429: Remove C2-based profiling In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 23:14:15 GMT, Igor Veresov wrote: > If there are no objections I'd like to remove some obsolete code that was an experiment to implement profiling in C2. It was added during the first tiered compilation experiments and is not unused. This pull request has now been integrated. Changeset: 04258898 Author: Igor Veresov URL: https://git.openjdk.java.net/jdk/commit/04258898 Stats: 469 lines in 9 files changed: 0 ins; 429 del; 40 mod 8255429: Remove C2-based profiling Reviewed-by: kvn, neliasso, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/888 From kvn at openjdk.java.net Wed Oct 28 16:38:51 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 28 Oct 2020 16:38:51 GMT Subject: RFR: 8255441: Cleanup ciEnv/jvmciEnv::lookup_method-s [v2] In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 08:40:34 GMT, Aleksey Shipilev wrote: >> Static analysis complains there is a potentially uninitialized `dest_method` after the switch, oblivious of `ShouldNotReachHere()`. This can be cleaned up along with related code. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8255441-cleanup-linkresolver > - Use `fatal` with proper message > - 8255441: Cleanup ciEnv/jvmciEnv::lookup_method-s Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/875 From psandoz at openjdk.java.net Wed Oct 28 16:40:49 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 28 Oct 2020 16:40:49 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen [v2] In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 23:12:42 GMT, Vladimir Kozlov wrote: >> Jie Fu has updated the pull request incrementally with one additional commit since the last revision: >> >> Add reductionL_avx512dq and reductionB_avx512bw > > Good. Thank you for cleaning this up. > Please, someone in Oracle runs Mach5 testing with UseAVX=3. @vnkozlov @DamonFool i am running some tests, and will report results when done. ------------- PR: https://git.openjdk.java.net/jdk/pull/874 From kvn at openjdk.java.net Wed Oct 28 16:42:46 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 28 Oct 2020 16:42:46 GMT Subject: RFR: 8255466: C2 crashes at ciObject::get_oop() const+0x0 In-Reply-To: <7Zhni-ZjtEcJ4KJG3AawaJc-4I7q583Fy4d9-i97VC0=.e25f08b9-a42f-48a4-85e9-dace848e502b@github.com> References: <7Zhni-ZjtEcJ4KJG3AawaJc-4I7q583Fy4d9-i97VC0=.e25f08b9-a42f-48a4-85e9-dace848e502b@github.com> Message-ID: On Wed, 28 Oct 2020 08:40:31 GMT, Vladimir Ivanov wrote: >> Graal testing hit this issue with product VM. Tom R. suggested that it could be the case of reflective unsafe static field access that would eventually be optimized away because the Class is null: >> `if (staticFieldBase != null) { >> return Unsafe.getInt(staticFieldBase, Unsafe.staticFieldOffset(field)); >> }` >> >> I suggest to replace assert with runtime check. Note, `o` value is assigned to `_const_oop` so semantically new code is the same except additional runtime check. >> >> I also noticed that const_oop is accessed without check for NULL in new Vector API code. I added check there too. >> >> Passed tier1-3 testing. > > src/hotspot/share/opto/type.cpp line 3047: > >> 3045: _is_ptr_to_narrowoop = false; >> 3046: } else if (klass() == ciEnv::current()->Class_klass() && >> 3047: _offset >= InstanceMirrorKlass::offset_of_static_fields()) { > > You could turn the assert into the check in the enclosing `if`. IMO it makes the code clearer. In that case it would fall into `Instance fields` code below under `} else {` which I don't want. ------------- PR: https://git.openjdk.java.net/jdk/pull/890 From kvn at openjdk.java.net Wed Oct 28 16:51:50 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 28 Oct 2020 16:51:50 GMT Subject: RFR: 8251994: VM crashed running TestComplexAddrExpr.java test with -XX:UseAVX=X [v2] In-Reply-To: References: <6voOeRu_AO13mIMLca9ZYspPXMIEWTmx1rvzbCwZmqI=.bd528e8e-0aee-4d90-a921-0e437f2ef612@github.com> Message-ID: On Wed, 28 Oct 2020 08:49:20 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Use static const bool local variable > > src/hotspot/share/opto/superword.cpp line 94: > >> 92: } >> 93: >> 94: static const bool _do_vector_loop_experimental = false; // Experimental vectorization which uses data from loop unrolling. > > For such purposes, I find debug VM flags a better option: they are constant in product binaries, easy to experiment/change from command-line with debug binaries, and enumerated in a single place. > > Any reason to prefer `static const` declarations? Adding any flag is exception - we should avoid it if it is not needed immediately. We have a lot of flags already. As I said currently this code is broken - I want to remove it from debug builds too. And it is easy to modify this value later - make it a field with the same name and initialize with diagnostic UseNewCode flag value as I did for testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/859 From kvn at openjdk.java.net Wed Oct 28 16:52:55 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 28 Oct 2020 16:52:55 GMT Subject: RFR: 8253101: Clean up CallStaticJavaNode EA flags In-Reply-To: <8RzONh6c2LiNek7pXUTsKwf_rubmlVMf6H8S6eAsyHA=.73264c8b-6d0d-41a5-9e96-b54d3a6358b9@github.com> References: <8RzONh6c2LiNek7pXUTsKwf_rubmlVMf6H8S6eAsyHA=.73264c8b-6d0d-41a5-9e96-b54d3a6358b9@github.com> Message-ID: On Wed, 28 Oct 2020 02:16:43 GMT, Jason Tatton wrote: > Please review this small change to cleanup fields: _is_scalar_replaceable and _is_non_escaping from CallStaticJavaNode as well as code which assigns to those fields. > > The change was tested with run-test-tier[1-3] on Linux arm64 and x86-64. > > Thanks, > Jason Yes, I agree with removing this code now. I just pointed reasons we have it. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/889 From kvn at openjdk.java.net Wed Oct 28 16:56:50 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 28 Oct 2020 16:56:50 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen [v2] In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 09:24:30 GMT, Jie Fu wrote: >> Hi all, >> >> Just as @jatin-bhateja pointed out [1], there are more instructs in x86.ad which should use legacy mode. >> >> It would be better to fix the following cases: >> ------------------------ >> 1. instruct mul2L_reg >> The code-gen logic uses phaddd [2], which requires legacy mode here [3]. >> This bug might be reproduced on AVX512 machines without avx512dq. >> >> 2. instruct vmul4L_reg_avx >> The code-gen logic uses vphaddd [4], which requires legacy mode here [5]. >> This bug might be reproduced on AVX512 machines without avx512dq. >> >> 3. instruct reductionL >> For MulReductionVL, the code-gen chain can be: reduceL --> reduce4L --> reduce_operation_128 --> vpmullq [6] >> vpmullq require legacy mode [7] if avx512dq isn't supported. >> This bug might be reproduced on AVX512 machines without avx512dq. >> >> 4. instruct reductionB >> For MinReductionV, the code-gen chain can be: reduceB --> reduce32B --> reduce_operation_128 --> pminsb [8] >> pminsb require legacy mode [9] if avx512bw isn't supported. >> This bug might be reproduced on AVX512 machines without avx512bw. >> ------------------------ >> Bugs in mul2L_reg/vmul4L_reg_avx/reductionL can be only reproduced on AVX512 machines without avx512dq. >> And bug in reductionB can be only reproduced on AVX512 machines without avx512bw. >> >> Unfortunately, it's impossible for us to create reproducers since our AVX512 platforms support both avx512dq and avx512bw. >> However, it do make sense to fix these unexposed bugs since vector api code will be sure to run on various paltforms (e.g., AVX512 machines without avx512dq/bw) in the future. >> >> The fix just changes vec to legVec, which is quite safe in theory. >> >> As for the reduction patterns of Float and Double, I don't see any reason that they should use legacy mode (maybe I've missed something). >> >> Testing: >> - jdk/incubator/vector on both AVX512 and AVX256 machines >> >> Any comments? >> >> Thanks a lot. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/pull/791#commitcomment-43473733 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5472 >> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6217 >> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5497 >> [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6165 >> [6] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1521 >> [7] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6428 >> [8] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1482 >> [9] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6475 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Add reductionL_avx512dq and reductionB_avx512bw Okay. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/874 From shade at openjdk.java.net Wed Oct 28 17:49:48 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 28 Oct 2020 17:49:48 GMT Subject: Integrated: 8255441: Cleanup ciEnv/jvmciEnv::lookup_method-s In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 08:17:48 GMT, Aleksey Shipilev wrote: > Static analysis complains there is a potentially uninitialized `dest_method` after the switch, oblivious of `ShouldNotReachHere()`. This can be cleaned up along with related code. This pull request has now been integrated. Changeset: af33e162 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/af33e162 Stats: 55 lines in 2 files changed: 8 ins; 19 del; 28 mod 8255441: Cleanup ciEnv/jvmciEnv::lookup_method-s Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/875 From psandoz at openjdk.java.net Wed Oct 28 17:59:47 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 28 Oct 2020 17:59:47 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen [v2] In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 16:54:27 GMT, Vladimir Kozlov wrote: >> Jie Fu has updated the pull request incrementally with one additional commit since the last revision: >> >> Add reductionL_avx512dq and reductionB_avx512bw > > Okay. @vnkozlov @DamonFool tests passed. ------------- PR: https://git.openjdk.java.net/jdk/pull/874 From github.com+670087+jrziviani at openjdk.java.net Wed Oct 28 17:35:53 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Wed, 28 Oct 2020 17:35:53 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions Message-ID: - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. Ref: PowerISA 3.1, page 129. These instructions are particularly interesting to improve the following pattern `(src1src2)? 1: 0)`, which can be found in `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. Long.toString, that generate such pattern in getChars, has showed a good performance gain by using these new instructions. Example: for (int i = 0; i < 200_000; i++) res = Long.toString((long)i); java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString Without setbc (average): 0.1178 seconds With setbc (average): 0.0396 seconds ------------- Commit messages: - RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions Changes: https://git.openjdk.java.net/jdk/pull/907/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255553 Stats: 44 lines in 3 files changed: 44 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/907/head:pull/907 PR: https://git.openjdk.java.net/jdk/pull/907 From kvn at openjdk.java.net Wed Oct 28 20:15:46 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 28 Oct 2020 20:15:46 GMT Subject: RFR: 8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown [v4] In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 10:16:29 GMT, Richard Reingruber wrote: >> The following test cases try to provoke VMOutOfMemoryException during object reallocation because of JVMTI PopFrame / ForceEarlyReturn: >> >> EAPopFrameNotInlinedReallocFailure >> EAPopInlinedMethodWithScalarReplacedObjectsReallocFailure >> EAForceEarlyReturnOfInlinedMethodWithScalarReplacedObjectsReallocFailure >> >> For ZGC (so far) this is not 100% reliable. >> >> Just ignoring the runs where the expected OOME was not raised was not accepted. >> >> Summary of the now accepted solution: >> >> - The 3 problematic test cases are skipped if ZGC is selected. >> >> - They are also skipped if no OOME during object reallocation can be expected because allocations are not eliminated. >> >> - In consumeAllMemory, as a last step, empty LinkedList nodes are created without long array to fill up small blocks of free memory. >> >> - EATests.java is removed from the problem list for ZGC. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Whitespace/indentation clean-up. Please change @requires for testing with Graal to `vm.graal.enabled` instead of `vm.jvmci` ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/775 From github.com+70893615+jasontatton-aws at openjdk.java.net Wed Oct 28 22:47:47 2020 From: github.com+70893615+jasontatton-aws at openjdk.java.net (Jason Tatton) Date: Wed, 28 Oct 2020 22:47:47 GMT Subject: Integrated: 8253101: Clean up CallStaticJavaNode EA flags In-Reply-To: <8RzONh6c2LiNek7pXUTsKwf_rubmlVMf6H8S6eAsyHA=.73264c8b-6d0d-41a5-9e96-b54d3a6358b9@github.com> References: <8RzONh6c2LiNek7pXUTsKwf_rubmlVMf6H8S6eAsyHA=.73264c8b-6d0d-41a5-9e96-b54d3a6358b9@github.com> Message-ID: <7r9Cp8w2NUfDw8QKE0tbX8oHfKzmaJlzLMLWauTQReo=.2941813c-8ad1-413e-adae-e337fc4586d9@github.com> On Wed, 28 Oct 2020 02:16:43 GMT, Jason Tatton wrote: > Please review this small change to cleanup fields: _is_scalar_replaceable and _is_non_escaping from CallStaticJavaNode as well as code which assigns to those fields. > > The change was tested with run-test-tier[1-3] on Linux arm64 and x86-64. > > Thanks, > Jason This pull request has now been integrated. Changeset: 1a5e6c98 Author: Jason Tatton (AWS) Committer: Paul Hohensee URL: https://git.openjdk.java.net/jdk/commit/1a5e6c98 Stats: 21 lines in 2 files changed: 0 ins; 21 del; 0 mod 8253101: Clean up CallStaticJavaNode EA flags Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/889 From jiefu at openjdk.java.net Wed Oct 28 23:06:47 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 28 Oct 2020 23:06:47 GMT Subject: RFR: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen [v2] In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 16:54:27 GMT, Vladimir Kozlov wrote: >> Jie Fu has updated the pull request incrementally with one additional commit since the last revision: >> >> Add reductionL_avx512dq and reductionB_avx512bw > > Okay. Thanks @vnkozlov , @PaulSandoz and @AzeemJiva . ------------- PR: https://git.openjdk.java.net/jdk/pull/874 From jiefu at openjdk.java.net Wed Oct 28 23:06:49 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 28 Oct 2020 23:06:49 GMT Subject: Integrated: 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 07:34:57 GMT, Jie Fu wrote: > Hi all, > > Just as @jatin-bhateja pointed out [1], there are more instructs in x86.ad which should use legacy mode. > > It would be better to fix the following cases: > ------------------------ > 1. instruct mul2L_reg > The code-gen logic uses phaddd [2], which requires legacy mode here [3]. > This bug might be reproduced on AVX512 machines without avx512dq. > > 2. instruct vmul4L_reg_avx > The code-gen logic uses vphaddd [4], which requires legacy mode here [5]. > This bug might be reproduced on AVX512 machines without avx512dq. > > 3. instruct reductionL > For MulReductionVL, the code-gen chain can be: reduceL --> reduce4L --> reduce_operation_128 --> vpmullq [6] > vpmullq require legacy mode [7] if avx512dq isn't supported. > This bug might be reproduced on AVX512 machines without avx512dq. > > 4. instruct reductionB > For MinReductionV, the code-gen chain can be: reduceB --> reduce32B --> reduce_operation_128 --> pminsb [8] > pminsb require legacy mode [9] if avx512bw isn't supported. > This bug might be reproduced on AVX512 machines without avx512bw. > ------------------------ > Bugs in mul2L_reg/vmul4L_reg_avx/reductionL can be only reproduced on AVX512 machines without avx512dq. > And bug in reductionB can be only reproduced on AVX512 machines without avx512bw. > > Unfortunately, it's impossible for us to create reproducers since our AVX512 platforms support both avx512dq and avx512bw. > However, it do make sense to fix these unexposed bugs since vector api code will be sure to run on various paltforms (e.g., AVX512 machines without avx512dq/bw) in the future. > > The fix just changes vec to legVec, which is quite safe in theory. > > As for the reduction patterns of Float and Double, I don't see any reason that they should use legacy mode (maybe I've missed something). > > Testing: > - jdk/incubator/vector on both AVX512 and AVX256 machines > > Any comments? > > Thanks a lot. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/pull/791#commitcomment-43473733 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5472 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6217 > [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L5497 > [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6165 > [6] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1521 > [7] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6428 > [8] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L1482 > [9] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.cpp#L6475 This pull request has now been integrated. Changeset: d82a6dcf Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/d82a6dcf Stats: 14 lines in 1 file changed: 0 ins; 4 del; 10 mod 8255438: [Vector API] More instructs in x86.ad should use legacy mode for code-gen Reviewed-by: kvn, vlivanov, azeemj ------------- PR: https://git.openjdk.java.net/jdk/pull/874 From zgu at openjdk.java.net Wed Oct 28 23:31:48 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 28 Oct 2020 23:31:48 GMT Subject: RFR: 8255564: InterpreterMacroAssembler::remove_activation() needs to restore thread right after VM call on x86_32 Message-ID: Currently, it restores thread register a bit too late for reset_last_Java_frame(). It is probably not a problem right now, cause there is no 32-bit GC that supports concurrent stack processing. It crashes Shenandoah GC with concurrent stack processing on x86_32, which I am working on. ------------- Commit messages: - 8255564: InterpreterMacroAssembler::remove_activation() needs to restore thread right after VM call on x86_32 Changes: https://git.openjdk.java.net/jdk/pull/919/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=919&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255564 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/919.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/919/head:pull/919 PR: https://git.openjdk.java.net/jdk/pull/919 From jiefu at openjdk.java.net Thu Oct 29 02:12:50 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 29 Oct 2020 02:12:50 GMT Subject: RFR: 8255565: [Vector API] Add missing format strings for extract instructs in x86.ad Message-ID: Hi all, Extract instructs in x86.ad like extractI/vextractI/extractL missed the format strings (format %{ ... %}). When analyzing or debugging with PrintOptoAssembly, it's hard to map the generated assembly code to the MachNode instructs without the format strings. So it would be better to fix it. Thanks a lot. Best regards, Jie ------------- Commit messages: - 8255565: [Vector API] Add missing format strings for extract instructs in x86.ad Changes: https://git.openjdk.java.net/jdk/pull/920/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=920&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255565 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/920.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/920/head:pull/920 PR: https://git.openjdk.java.net/jdk/pull/920 From kvn at openjdk.java.net Thu Oct 29 03:23:08 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 29 Oct 2020 03:23:08 GMT Subject: RFR: 8255466: C2 crashes at ciObject::get_oop() const+0x0 [v2] In-Reply-To: References: Message-ID: > Graal testing hit this issue with product VM. Tom R. suggested that it could be the case of reflective unsafe static field access that would eventually be optimized away because the Class is null: > `if (staticFieldBase != null) { > return Unsafe.getInt(staticFieldBase, Unsafe.staticFieldOffset(field)); > }` > > I suggest to replace assert with runtime check. Note, `o` value is assigned to `_const_oop` so semantically new code is the same except additional runtime check. > > I also noticed that const_oop is accessed without check for NULL in new Vector API code. I added check there too. > > Passed tier1-3 testing. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Added regression test provided by Tom ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/890/files - new: https://git.openjdk.java.net/jdk/pull/890/files/da8be529..0de93893 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=890&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=890&range=00-01 Stats: 62 lines in 1 file changed: 62 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/890.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/890/head:pull/890 PR: https://git.openjdk.java.net/jdk/pull/890 From dholmes at openjdk.java.net Thu Oct 29 04:24:43 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 29 Oct 2020 04:24:43 GMT Subject: RFR: 8255564: InterpreterMacroAssembler::remove_activation() needs to restore thread right after VM call on x86_32 In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 23:25:48 GMT, Zhengyu Gu wrote: > Currently, it restores thread register a bit too late for reset_last_Java_frame(). > > It is probably not a problem right now, cause there is no 32-bit GC that supports concurrent stack processing. It crashes Shenandoah GC with concurrent stack processing on x86_32, which I am working on. Looks good. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/919 From shade at openjdk.java.net Thu Oct 29 06:28:42 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 29 Oct 2020 06:28:42 GMT Subject: RFR: 8255564: InterpreterMacroAssembler::remove_activation() needs to restore thread right after VM call on x86_32 In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 23:25:48 GMT, Zhengyu Gu wrote: > Currently, it restores thread register a bit too late for reset_last_Java_frame(). > > It is probably not a problem right now, cause there is no 32-bit GC that supports concurrent stack processing. It crashes Shenandoah GC with concurrent stack processing on x86_32, which I am working on. Thanks! This is a regression since JDK-8255233, I linked the bugs. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/919 From jbhateja at openjdk.java.net Thu Oct 29 06:33:50 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 29 Oct 2020 06:33:50 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v4] In-Reply-To: References: Message-ID: <6MClf7up0tikZCf-1JAmKXaNMstf2aELFl3ArqQU7DE=.50c1fa2a-93e5-4501-973a-84a942e6d409@github.com> On Mon, 19 Oct 2020 18:33:22 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Replacing explicit type checks with existing type checking routines >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions. > > There is regression after 8252847 changes: 8254890. > It should be fixed before we proceed with these changes. @vnkozlov , @neliasso , @nsjian , kindly let me know if there are further review comments on this patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Thu Oct 29 07:42:44 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 29 Oct 2020 07:42:44 GMT Subject: RFR: 8255565: [Vector API] Add missing format strings for extract instructs in x86.ad In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 02:07:03 GMT, Jie Fu wrote: > Hi all, > > Extract instructs in x86.ad like extractI/vextractI/extractL missed the format strings (format %{ ... %}). > When analyzing or debugging with PrintOptoAssembly, it's hard to map the generated assembly code to the MachNode instructs without the format strings. > So it would be better to fix it. > > Thanks a lot. > Best regards, > Jie Hi @DamonFool , thanks for fixing these, format string gets emitted for debug builds with +PrintOptoAssembly, few of them are muti-match patterns, may be introducing a new format specifier like %t which is replaced by BasicType of ideal node through ADLC extension could make it more clear. LGTM ------------- PR: https://git.openjdk.java.net/jdk/pull/920 From njian at openjdk.java.net Thu Oct 29 08:02:44 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 29 Oct 2020 08:02:44 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v9] In-Reply-To: References: <8Ryyxuf5P2D6WNyj4riYCTgN0U6WLrLpBmxhNbnmPpQ=.b2ed5660-99d0-49d1-83e0-8b2de518d7b8@github.com> Message-ID: On Fri, 23 Oct 2020 12:00:55 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectornode.cpp line 775: >> >>> 773: VectorMaskGenNode* make(int opc, Node* src, const Type* ty, const Type* ety) { >>> 774: return new VectorMaskGenNode(src, ty, ety); >>> 775: } >> >> These are not used? > > This is a just a helper routine not used currently though. So maybe the nodes creation code in generate_partial_inlining_block() can use these helper functions? ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From njian at openjdk.java.net Thu Oct 29 08:05:43 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 29 Oct 2020 08:05:43 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v9] In-Reply-To: References: <8Ryyxuf5P2D6WNyj4riYCTgN0U6WLrLpBmxhNbnmPpQ=.b2ed5660-99d0-49d1-83e0-8b2de518d7b8@github.com> Message-ID: On Fri, 23 Oct 2020 12:00:46 GMT, Jatin Bhateja wrote: > As currently there is no support for mask registers in RA, for X86 long ideal type is sufficient for a mask producing node (def operand is a mask register) ; But for complete support returning Op_RegVMask as an ideal_reg() type for masked Ideal node should do the trick without creating an explicit new ideal Type for mask generating nodes. Spill sizes and number of slots may be different for X86 and ARM (SVE). So, do you have a plan to support Op_RegVMask? In SVE, we will use this kind of node for mask/predicate type. > Shallow copy during Node::clone should be sufficient here since encapsulated element type will be preserved. Checking the code in Node::clone() again, I think the object copy relies on size_of(), so you need to override that to get the correct object size for copying. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Thu Oct 29 10:28:00 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 29 Oct 2020 10:28:00 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - JDK-8252848: Review comments addressed. - Merge remote-tracking branch 'origin' into JDK-8252848 - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - JDK-8252848 : Review comments resolution. - Merge remote-tracking branch 'upstream' into JDK-8252848 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - Replacing explicit type checks with existing type checking routines - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - ... and 1 more: https://git.openjdk.java.net/jdk/compare/4031cb41...9e85592a ------------- Changes: https://git.openjdk.java.net/jdk/pull/302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=10 Stats: 527 lines in 27 files changed: 477 ins; 23 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Thu Oct 29 10:28:00 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 29 Oct 2020 10:28:00 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v9] In-Reply-To: References: <8Ryyxuf5P2D6WNyj4riYCTgN0U6WLrLpBmxhNbnmPpQ=.b2ed5660-99d0-49d1-83e0-8b2de518d7b8@github.com> Message-ID: On Thu, 29 Oct 2020 08:03:23 GMT, Ningsheng Jian wrote: > > As currently there is no support for mask registers in RA, for X86 long ideal type is sufficient for a mask producing node (def operand is a mask register) ; But for complete support returning Op_RegVMask as an ideal_reg() type for masked Ideal node should do the trick without creating an explicit new ideal Type for mask generating nodes. Spill sizes and number of slots may be different for X86 and ARM (SVE). > > So, do you have a plan to support Op_RegVMask? In SVE, we will use this kind of node for mask/predicate type. > Not as the part of this patch but as a separate RFE, we may benefit from decoupling b/w ideal type(bottom_type) and ideal_reg() for a given Ideal node; this should allow us to build any future extension on top of these masked generating nodes (VectorMaskGen). > > Shallow copy during Node::clone should be sufficient here since encapsulated element type will be preserved. > > Checking the code in Node::clone() again, I think the object copy relies on size_of(), so you need to override that to get the correct object size for copying. Thanks for pointing out; I missed this earlier. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From mdoerr at openjdk.java.net Thu Oct 29 11:25:43 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 29 Oct 2020 11:25:43 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 17:00:43 GMT, Ziviani wrote: > - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. > - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. > Ref: PowerISA 3.1, page 129. > > These instructions are particularly interesting to improve the following > pattern `(src1src2)? 1: 0)`, which can be found in > `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. > > Long.toString, that generate such pattern in getChars, has showed a > good performance gain by using these new instructions. > > Example: > for (int i = 0; i < 200_000; i++) > res = Long.toString((long)i); > > java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString > > Without setbc (average): 0.1178 seconds > With setbc (average): 0.0396 seconds Hi Jose, thanks for improving this. Looks correct. I have some ideas to share: Note that it's also possible to implement it branch free for < Power10: See LIR_Assembler::comp_fl2i in c1_LIRAssembler_ppc.cpp. This could also be used for C2 with flagsRegCR0. Maybe you would like to clean up the existing C2 code and remove the old cmovI_conIvalueMinus1_conIvalue0_conIvalue1_Ex and cmovI_conIvalueMinus1_conIvalue1? You could also optimize C1 and the template interpreter (TemplateTable::lcmp + float_cmp, but interpreter is not so critical) for Power10. But we can also just take your C2 improvement for Power10 if you don't have time for additional parts. ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From shade at openjdk.java.net Thu Oct 29 12:32:49 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 29 Oct 2020 12:32:49 GMT Subject: RFR: 8255550: x86: Assembler::cmpq(Address dst, Register src) encoding is incorrect [v2] In-Reply-To: References: Message-ID: > Compare: > > void Assembler::cmpq(Address dst, Register src) { > InstructionMark im(this); > emit_int16(get_prefixq(dst, src), 0x3B); > emit_operand(src, dst); > } > > void Assembler::cmpq(Register dst, Address src) { > InstructionMark im(this); > emit_int16(get_prefixq(src, dst), 0x3B); > emit_operand(dst, src); > } > > They use the same opcode -- `0x3B`, which is for `CMP r, r/m`. While `cmpq(Address,Register)` actually should be using `0x39` for `CMP r/m, r`. I also suspect they emit basically the same instruction, because the `get_prefixq` and `emit_operand` argument order is irrelevant. > > AFAIU, it does not break horribly, because the `cmpq(Address,Register)` is not used anywhere except the new code in `MacroAssembler::safepoint_poll`, added by [JDK-8253180](https://bugs.openjdk.java.net/browse/JDK-8253180). This was found by Zhengyu, when he tried to enable that new code on x86_32 by inverting `cmpq(addr, reg); jcc(above, slow_path)` to `cmpptr(reg, addr); jcc(belowEquals, slow_path)`. Then, everything blew up, because the semantics of `cmpq(addr,reg)` was wrong, and this inversion was subtly broken. > > Current candidate patch encodes this `cmpq` properly. Since that changes the semantics, I had to flip the condition code in its only use. I opted to do this, because _maybe_ some code in downstream projects want to use this odd `cmpq`. Although even if so, the uses could be trivially rewritten. > > Alternatives: > - I considered removing `cmpq(Address,Register)` altogether, but it would require more work to untangle `cmpptr(Address,Register)` and `cmpptr(Address,AddressLiteral)` for x86_32. > - We can also split out `MacroAssembler::safepoint_poll` change to use `cmpq(Register,Address)` to begin with, but current shape gives us a way to test the encoding. > > Additional testing: > - [x] tier1 with Shenandoah (a few failures are pre-existing) > - [x] tier1 with Z (AFAICS, all failing tests are OOME'ing or break SA, and probably are problem-listed) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' into JDK-8255550-cmpq-incorrect - 8255550: x86: Assembler::cmpq(Address dst, Register src) encoding is incorrect ------------- Changes: https://git.openjdk.java.net/jdk/pull/910/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=910&range=01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/910.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/910/head:pull/910 PR: https://git.openjdk.java.net/jdk/pull/910 From shade at openjdk.java.net Thu Oct 29 12:32:50 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 29 Oct 2020 12:32:50 GMT Subject: RFR: 8255550: x86: Assembler::cmpq(Address dst, Register src) encoding is incorrect [v2] In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 07:28:29 GMT, Erik ?sterlund wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'master' into JDK-8255550-cmpq-incorrect >> - 8255550: x86: Assembler::cmpq(Address dst, Register src) encoding is incorrect > > Thanks for poking me. I would prefer to change to the cmpq instruction that has the opposite order in the stack watermark barrier instead. Everywhere in the code I talk about the condition being sp being "above" watermark. Changing it to less makes me twist my head in ways that heads should not twist. Dropped the `safepoint_poll` hunk after #924 integration. @fisk, this must be fine with you then? ------------- PR: https://git.openjdk.java.net/jdk/pull/910 From zgu at openjdk.java.net Thu Oct 29 12:36:43 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 29 Oct 2020 12:36:43 GMT Subject: RFR: 8255564: InterpreterMacroAssembler::remove_activation() needs to restore thread right after VM call on x86_32 In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 06:25:34 GMT, Aleksey Shipilev wrote: >> Currently, it restores thread register a bit too late for reset_last_Java_frame(). >> >> It is probably not a problem right now, cause there is no 32-bit GC that supports concurrent stack processing. It crashes Shenandoah GC with concurrent stack processing on x86_32, which I am working on. > > Thanks! This is a regression since JDK-8255233, I linked the bugs. Thanks, David and Aleksey! ------------- PR: https://git.openjdk.java.net/jdk/pull/919 From zgu at openjdk.java.net Thu Oct 29 12:36:44 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 29 Oct 2020 12:36:44 GMT Subject: Integrated: 8255564: InterpreterMacroAssembler::remove_activation() needs to restore thread right after VM call on x86_32 In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 23:25:48 GMT, Zhengyu Gu wrote: > Currently, it restores thread register a bit too late for reset_last_Java_frame(). > > It is probably not a problem right now, cause there is no 32-bit GC that supports concurrent stack processing. It crashes Shenandoah GC with concurrent stack processing on x86_32, which I am working on. This pull request has now been integrated. Changeset: 579e50bb Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/579e50bb Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod 8255564: InterpreterMacroAssembler::remove_activation() needs to restore thread right after VM call on x86_32 Reviewed-by: dholmes, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/919 From eosterlund at openjdk.java.net Thu Oct 29 13:46:45 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 29 Oct 2020 13:46:45 GMT Subject: RFR: 8255550: x86: Assembler::cmpq(Address dst, Register src) encoding is incorrect [v2] In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 12:32:49 GMT, Aleksey Shipilev wrote: >> Compare: >> >> void Assembler::cmpq(Address dst, Register src) { >> InstructionMark im(this); >> emit_int16(get_prefixq(dst, src), 0x3B); >> emit_operand(src, dst); >> } >> >> void Assembler::cmpq(Register dst, Address src) { >> InstructionMark im(this); >> emit_int16(get_prefixq(src, dst), 0x3B); >> emit_operand(dst, src); >> } >> >> They use the same opcode -- `0x3B`, which is for `CMP r, r/m`. While `cmpq(Address,Register)` actually should be using `0x39` for `CMP r/m, r`. I also suspect they emit basically the same instruction, because the `get_prefixq` and `emit_operand` argument order is irrelevant. >> >> AFAIU, it does not break horribly, because the `cmpq(Address,Register)` is not used anywhere except the new code in `MacroAssembler::safepoint_poll`, added by [JDK-8253180](https://bugs.openjdk.java.net/browse/JDK-8253180). This was found by Zhengyu, when he tried to enable that new code on x86_32 by inverting `cmpq(addr, reg); jcc(above, slow_path)` to `cmpptr(reg, addr); jcc(belowEquals, slow_path)`. Then, everything blew up, because the semantics of `cmpq(addr,reg)` was wrong, and this inversion was subtly broken. >> >> Current candidate patch encodes this `cmpq` properly. Since that changes the semantics, I had to flip the condition code in its only use. I opted to do this, because _maybe_ some code in downstream projects want to use this odd `cmpq`. Although even if so, the uses could be trivially rewritten. >> >> Alternatives: >> - I considered removing `cmpq(Address,Register)` altogether, but it would require more work to untangle `cmpptr(Address,Register)` and `cmpptr(Address,AddressLiteral)` for x86_32. >> - We can also split out `MacroAssembler::safepoint_poll` change to use `cmpq(Register,Address)` to begin with, but current shape gives us a way to test the encoding. >> >> Additional testing: >> - [x] tier1 with Shenandoah (a few failures are pre-existing) >> - [x] tier1 with Z (AFAICS, all failing tests are OOME'ing or break SA, and probably are problem-listed) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8255550-cmpq-incorrect > - 8255550: x86: Assembler::cmpq(Address dst, Register src) encoding is incorrect Yep, looks all good to me. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/910 From shade at openjdk.java.net Thu Oct 29 14:24:45 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 29 Oct 2020 14:24:45 GMT Subject: Integrated: 8255550: x86: Assembler::cmpq(Address dst, Register src) encoding is incorrect In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 18:24:10 GMT, Aleksey Shipilev wrote: > Compare: > > void Assembler::cmpq(Address dst, Register src) { > InstructionMark im(this); > emit_int16(get_prefixq(dst, src), 0x3B); > emit_operand(src, dst); > } > > void Assembler::cmpq(Register dst, Address src) { > InstructionMark im(this); > emit_int16(get_prefixq(src, dst), 0x3B); > emit_operand(dst, src); > } > > They use the same opcode -- `0x3B`, which is for `CMP r, r/m`. While `cmpq(Address,Register)` actually should be using `0x39` for `CMP r/m, r`. I also suspect they emit basically the same instruction, because the `get_prefixq` and `emit_operand` argument order is irrelevant. > > AFAIU, it does not break horribly, because the `cmpq(Address,Register)` is not used anywhere except the new code in `MacroAssembler::safepoint_poll`, added by [JDK-8253180](https://bugs.openjdk.java.net/browse/JDK-8253180). This was found by Zhengyu, when he tried to enable that new code on x86_32 by inverting `cmpq(addr, reg); jcc(above, slow_path)` to `cmpptr(reg, addr); jcc(belowEquals, slow_path)`. Then, everything blew up, because the semantics of `cmpq(addr,reg)` was wrong, and this inversion was subtly broken. > > Current candidate patch encodes this `cmpq` properly. Since that changes the semantics, I had to flip the condition code in its only use. I opted to do this, because _maybe_ some code in downstream projects want to use this odd `cmpq`. Although even if so, the uses could be trivially rewritten. > > Alternatives: > - I considered removing `cmpq(Address,Register)` altogether, but it would require more work to untangle `cmpptr(Address,Register)` and `cmpptr(Address,AddressLiteral)` for x86_32. > - We can also split out `MacroAssembler::safepoint_poll` change to use `cmpq(Register,Address)` to begin with, but current shape gives us a way to test the encoding. > > Additional testing: > - [x] tier1 with Shenandoah (a few failures are pre-existing) > - [x] tier1 with Z (AFAICS, all failing tests are OOME'ing or break SA, and probably are problem-listed) This pull request has now been integrated. Changeset: 9e5bbff5 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/9e5bbff5 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8255550: x86: Assembler::cmpq(Address dst, Register src) encoding is incorrect Reviewed-by: kvn, eosterlund ------------- PR: https://git.openjdk.java.net/jdk/pull/910 From rrich at openjdk.java.net Thu Oct 29 14:53:54 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Thu, 29 Oct 2020 14:53:54 GMT Subject: RFR: 8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown [v4] In-Reply-To: References: Message-ID: <4RqPScJJcCjXJM4JafcdYZsiitxLeLpi5lfcEN4afrg=.088a9144-9413-4654-b76a-0c5b37530952@github.com> On Wed, 28 Oct 2020 20:13:15 GMT, Vladimir Kozlov wrote: > > > Please change @requires for testing with Graal to `vm.graal.enabled` instead of `vm.jvmci` Sure. I've done that now. ------------- PR: https://git.openjdk.java.net/jdk/pull/775 From rrich at openjdk.java.net Thu Oct 29 14:53:54 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Thu, 29 Oct 2020 14:53:54 GMT Subject: RFR: 8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown [v5] In-Reply-To: References: Message-ID: > The following test cases try to provoke VMOutOfMemoryException during object reallocation because of JVMTI PopFrame / ForceEarlyReturn: > > EAPopFrameNotInlinedReallocFailure > EAPopInlinedMethodWithScalarReplacedObjectsReallocFailure > EAForceEarlyReturnOfInlinedMethodWithScalarReplacedObjectsReallocFailure > > For ZGC (so far) this is not 100% reliable. > > Just ignoring the runs where the expected OOME was not raised was not accepted. > > Summary of the now accepted solution: > > - The 3 problematic test cases are skipped if ZGC is selected. > > - They are also skipped if no OOME during object reallocation can be expected because allocations are not eliminated. > > - In consumeAllMemory, as a last step, empty LinkedList nodes are created without long array to fill up small blocks of free memory. > > - EATests.java is removed from the problem list for ZGC. Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Replaced vm.jvmci with vm.graal.enabled in @requires clause. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/775/files - new: https://git.openjdk.java.net/jdk/pull/775/files/4676f1da..4e878e8e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=775&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=775&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/775.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/775/head:pull/775 PR: https://git.openjdk.java.net/jdk/pull/775 From kvn at openjdk.java.net Thu Oct 29 15:37:43 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 29 Oct 2020 15:37:43 GMT Subject: RFR: 8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown [v5] In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 14:53:54 GMT, Richard Reingruber wrote: >> The following test cases try to provoke VMOutOfMemoryException during object reallocation because of JVMTI PopFrame / ForceEarlyReturn: >> >> EAPopFrameNotInlinedReallocFailure >> EAPopInlinedMethodWithScalarReplacedObjectsReallocFailure >> EAForceEarlyReturnOfInlinedMethodWithScalarReplacedObjectsReallocFailure >> >> For ZGC (so far) this is not 100% reliable. >> >> Just ignoring the runs where the expected OOME was not raised was not accepted. >> >> Summary of the now accepted solution: >> >> - The 3 problematic test cases are skipped if ZGC is selected. >> >> - They are also skipped if no OOME during object reallocation can be expected because allocations are not eliminated. >> >> - In consumeAllMemory, as a last step, empty LinkedList nodes are created without long array to fill up small blocks of free memory. >> >> - EATests.java is removed from the problem list for ZGC. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Replaced vm.jvmci with vm.graal.enabled in @requires clause. Good ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/775 From phh at openjdk.java.net Thu Oct 29 17:41:44 2020 From: phh at openjdk.java.net (Paul Hohensee) Date: Thu, 29 Oct 2020 17:41:44 GMT Subject: RFR: 8241495: Make more compiler related flags available on a per method level [v3] In-Reply-To: References: Message-ID: <4WerWhkoSm3Unu9nppz_sIliuFUPVVq5P-jNU_C7xGI=.a501ca94-4d3c-4d19-876d-16fd4416f5da@github.com> On Tue, 27 Oct 2020 03:22:29 GMT, Xin Liu wrote: >> 8241495: Make more compiler related flags available on a per method level > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8241495: Make more compiler related flags available on a per method level > > rollback others but only leave breakAtCompile and breakAtExecution. Lgtm. ------------- Marked as reviewed by phh (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/796 From xliu at openjdk.java.net Thu Oct 29 17:44:46 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 29 Oct 2020 17:44:46 GMT Subject: Integrated: 8241495: Make more compiler related flags available on a per method level In-Reply-To: References: Message-ID: <0CpReZSBaAXcQGmLzgbxI1fzH-mAqtNGETCrBDjAsvg=.15e8b72d-9df4-4c8c-a843-39b92690d1b2@github.com> On Thu, 22 Oct 2020 07:28:01 GMT, Xin Liu wrote: > 8241495: Make more compiler related flags available on a per method level This pull request has now been integrated. Changeset: 2a50c3f8 Author: Xin Liu Committer: Paul Hohensee URL: https://git.openjdk.java.net/jdk/commit/2a50c3f8 Stats: 4 lines in 2 files changed: 1 ins; 1 del; 2 mod 8241495: Make more compiler related flags available on a per method level add more method-level options for -XX:CompileCommand eg. -XX:CompileCommand=option,java.lang.String::startsWith,BreakAtCompile directs JIT compilers to hit BREAKPOINT when they compile the method java.lang.String::startsWith. Reviewed-by: neliasso, azeemj, phh ------------- PR: https://git.openjdk.java.net/jdk/pull/796 From shade at openjdk.java.net Thu Oct 29 19:19:51 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 29 Oct 2020 19:19:51 GMT Subject: RFR: 8255615: Zero: demote ZeroStack::abi_stack_available guarantee to assert Message-ID: It is currently guarantee(), which slows down release bits unnecessarily: inline int ZeroStack::abi_stack_available(Thread *thread) const { guarantee(Thread::current() == thread, "should run in the same thread"); ------------- Commit messages: - 8255615: Zero: demote ZeroStack::abi_stack_available Thread check to assert Changes: https://git.openjdk.java.net/jdk/pull/943/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=943&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255615 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/943.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/943/head:pull/943 PR: https://git.openjdk.java.net/jdk/pull/943 From github.com+670087+jrziviani at openjdk.java.net Thu Oct 29 21:52:43 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Thu, 29 Oct 2020 21:52:43 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 11:22:34 GMT, Martin Doerr wrote: >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > Hi Jose, > thanks for improving this. Looks correct. I have some ideas to share: > Note that it's also possible to implement it branch free for < Power10: See LIR_Assembler::comp_fl2i in c1_LIRAssembler_ppc.cpp. > This could also be used for C2 with flagsRegCR0. Maybe you would like to clean up the existing C2 code and remove the old cmovI_conIvalueMinus1_conIvalue0_conIvalue1_Ex and cmovI_conIvalueMinus1_conIvalue1? > You could also optimize C1 and the template interpreter (TemplateTable::lcmp + float_cmp, but interpreter is not so critical) for Power10. > But we can also just take your C2 improvement for Power10 if you don't have time for additional parts. @TheRealMDoerr Hallo!!! Thanks for reviewing and for your suggestion (I really appreciate it, I'm discovering more parts of C1). I'm implementing it and will send a patch. Do you prefer me to create another task to handle that or do you prefer these changes in this same task? ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From kvn at openjdk.java.net Thu Oct 29 22:40:48 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 29 Oct 2020 22:40:48 GMT Subject: Integrated: 8255466: C2 crashes at ciObject::get_oop() const+0x0 In-Reply-To: References: Message-ID: <4UEBVGOF21Jvdf8qFtFHgJNqg6c1p9Op4NmWScYub34=.7c4ba652-eda7-4082-9005-9f45a3a2486c@github.com> On Wed, 28 Oct 2020 02:19:42 GMT, Vladimir Kozlov wrote: > Graal testing hit this issue with product VM. Tom R. suggested that it could be the case of reflective unsafe static field access that would eventually be optimized away because the Class is null: > `if (staticFieldBase != null) { > return Unsafe.getInt(staticFieldBase, Unsafe.staticFieldOffset(field)); > }` > > I suggest to replace assert with runtime check. Note, `o` value is assigned to `_const_oop` so semantically new code is the same except additional runtime check. > > I also noticed that const_oop is accessed without check for NULL in new Vector API code. I added check there too. > > Passed tier1-3 testing. This pull request has now been integrated. Changeset: 56eb5f54 Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/56eb5f54 Stats: 71 lines in 3 files changed: 67 ins; 0 del; 4 mod 8255466: C2 crashes at ciObject::get_oop() const+0x0 Reviewed-by: vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/890 From ngasson at openjdk.java.net Fri Oct 30 02:29:45 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Fri, 30 Oct 2020 02:29:45 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v5] In-Reply-To: References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: On Wed, 21 Oct 2020 04:35:14 GMT, Yasumasa Suenaga wrote: >> Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Make DumpPerfMapAtExit a diagnostic option > > Changes requested by ysuenaga (Reviewer). @YaSuenag could you re-review the latest changes? ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From ysuenaga at openjdk.java.net Fri Oct 30 04:36:45 2020 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Fri, 30 Oct 2020 04:36:45 GMT Subject: RFR: 8254723: add diagnostic command to write Linux perf map file [v5] In-Reply-To: References: <7T_M6C-3WpLwXYH3RuRCuDQUW0qMyKIWAs8RaPW7D0s=.d659e5a0-e8a2-4816-8f60-1dd7653f4c7b@github.com> Message-ID: On Wed, 21 Oct 2020 04:35:14 GMT, Yasumasa Suenaga wrote: >> Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Make DumpPerfMapAtExit a diagnostic option > > Changes requested by ysuenaga (Reviewer). > @YaSuenag could you re-review the latest changes? Sure, the change looks good to me. However I don't understand why CSR is not needed. It introduces new dcmd for Linux. ------------- PR: https://git.openjdk.java.net/jdk/pull/760 From xliu at openjdk.java.net Fri Oct 30 05:51:50 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 30 Oct 2020 05:51:50 GMT Subject: RFR: 8255562: delete UseRDPCForConstantTableBase Message-ID: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> UseRDPCForConstantTableBase was a SPARC-exclusive flag. Sparc has been removed from hotspot, so remove this flag. ------------- Commit messages: - delete UseRDPCForConstantTableBase Changes: https://git.openjdk.java.net/jdk/pull/949/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=949&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255562 Stats: 4 lines in 2 files changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/949.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/949/head:pull/949 PR: https://git.openjdk.java.net/jdk/pull/949 From jiefu at openjdk.java.net Fri Oct 30 06:51:43 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 30 Oct 2020 06:51:43 GMT Subject: RFR: 8255565: [Vector API] Add missing format strings for extract instructs in x86.ad In-Reply-To: References: Message-ID: <7yhiOXgRulwjQPrYn6paG2VPUmB3UJi3K9RfAyYMXmM=.9bb31dac-9b3e-4d70-baad-8727ecdbda1f@github.com> On Thu, 29 Oct 2020 02:07:03 GMT, Jie Fu wrote: > Hi all, > > Extract instructs in x86.ad like extractI/vextractI/extractL missed the format strings (format %{ ... %}). > When analyzing or debugging with PrintOptoAssembly, it's hard to map the generated assembly code to the MachNode instructs without the format strings. > So it would be better to fix it. > > Thanks a lot. > Best regards, > Jie > Hi @DamonFool , thanks for fixing these, format string gets emitted for debug builds with +PrintOptoAssembly, few of them are muti-match patterns, may be introducing a new format specifier like %t which is replaced by BasicType of ideal node through ADLC extension could make it more clear. > > LGTM Thanks @jatin-bhateja for your review. May I get one more review from a reviewer? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/920 From vlivanov at openjdk.java.net Fri Oct 30 08:31:44 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 30 Oct 2020 08:31:44 GMT Subject: RFR: 8255565: [Vector API] Add missing format strings for extract instructs in x86.ad In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 02:07:03 GMT, Jie Fu wrote: > Hi all, > > Extract instructs in x86.ad like extractI/vextractI/extractL missed the format strings (format %{ ... %}). > When analyzing or debugging with PrintOptoAssembly, it's hard to map the generated assembly code to the MachNode instructs without the format strings. > So it would be better to fix it. > > Thanks a lot. > Best regards, > Jie Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/920 From thartmann at openjdk.java.net Fri Oct 30 08:49:49 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 30 Oct 2020 08:49:49 GMT Subject: RFR: 8255562: delete UseRDPCForConstantTableBase In-Reply-To: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> References: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> Message-ID: On Fri, 30 Oct 2020 05:43:25 GMT, Xin Liu wrote: > UseRDPCForConstantTableBase was a SPARC-exclusive flag. Sparc has been removed > from hotspot, so remove this flag. Changes requested by thartmann (Reviewer). src/hotspot/share/opto/machnode.hpp line 437: > 435: virtual void emit(CodeBuffer& cbuf, PhaseRegAlloc* ra_) const; > 436: virtual uint size(PhaseRegAlloc* ra_) const; > 437: virtual bool pinned() const { return false; } The method can be removed because `Node::pinned` returns false. ------------- PR: https://git.openjdk.java.net/jdk/pull/949 From jiefu at openjdk.java.net Fri Oct 30 08:53:46 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 30 Oct 2020 08:53:46 GMT Subject: RFR: 8255565: [Vector API] Add missing format strings for extract instructs in x86.ad In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 08:29:07 GMT, Vladimir Ivanov wrote: > Looks good. Thanks @iwanowww for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/920 From jiefu at openjdk.java.net Fri Oct 30 08:53:47 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 30 Oct 2020 08:53:47 GMT Subject: Integrated: 8255565: [Vector API] Add missing format strings for extract instructs in x86.ad In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 02:07:03 GMT, Jie Fu wrote: > Hi all, > > Extract instructs in x86.ad like extractI/vextractI/extractL missed the format strings (format %{ ... %}). > When analyzing or debugging with PrintOptoAssembly, it's hard to map the generated assembly code to the MachNode instructs without the format strings. > So it would be better to fix it. > > Thanks a lot. > Best regards, > Jie This pull request has now been integrated. Changeset: e48016b1 Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/e48016b1 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod 8255565: [Vector API] Add missing format strings for extract instructs in x86.ad Reviewed-by: vlivanov, jbhateja ------------- PR: https://git.openjdk.java.net/jdk/pull/920 From rrich at openjdk.java.net Fri Oct 30 09:53:45 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Fri, 30 Oct 2020 09:53:45 GMT Subject: RFR: 8255072: [TESTBUG] com/sun/jdi/EATests.java should not fail if expected VMOutOfMemoryException is not thrown [v5] In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 15:34:58 GMT, Vladimir Kozlov wrote: > Good Thanks for reviewing. Will integrate beginning of next week. ------------- PR: https://git.openjdk.java.net/jdk/pull/775 From mdoerr at openjdk.java.net Fri Oct 30 10:21:44 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 30 Oct 2020 10:21:44 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 17:00:43 GMT, Ziviani wrote: > - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. > - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. > Ref: PowerISA 3.1, page 129. > > These instructions are particularly interesting to improve the following > pattern `(src1src2)? 1: 0)`, which can be found in > `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. > > Long.toString, that generate such pattern in getChars, has showed a > good performance gain by using these new instructions. > > Example: > for (int i = 0; i < 200_000; i++) > res = Long.toString((long)i); > > java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString > > Without setbc (average): 0.1178 seconds > With setbc (average): 0.0396 seconds If you just want to add C1 code, I suggest to do that in this PR. It's up to you if you prefer a new PR for further changes. I still think cleaning up the ad file a bit and using branch free code for all Power processors would be nice. I'd also be ok with replacing the "expand" code by assembly code directly in "ins_encode". Should be good enough for modern out-of-order processors. src/hotspot/cpu/ppc/ppc.ad line 11521: > 11519: __ setbc(R0, (($crx$$reg << 2) | 1) /* greater than */); > 11520: __ setnbc($dst$$Register, ($crx$$reg << 2) /* less than */); > 11521: __ or_unchecked($dst$$Register, $dst$$Register, R0); In general, I think it'd be better to use orr which makes sure we never unintentionally emit an instruction which modifies smt priority "smt_prio_...". In this case dst != R0, so this doesn't happen. ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From chagedorn at openjdk.java.net Fri Oct 30 11:35:57 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 30 Oct 2020 11:35:57 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance Message-ID: The dominance failures start to occur after the fix for [JDK-8249749](https://bugs.openjdk.java.net/browse/JDK-8249749) which enabled the method `SWPointer::scaled_iv_plus_offset` to call itself recursively and walk the graph to match more instead of stopping immediately (no recursion): https://github.com/openjdk/jdk/commit/092389e3c91822b1e3f56f203cb7b90e84673f8e#diff-8f29dd005a0f949d108687dabb7379c73dfd85cd782da453509dc9b6cb8c9f81L3789-R3812 We check in `SWPointer::offset_plus_k` if a node is invariant and if so then we choose it as invariant. However, we now have cases in the Renaissance benchmarks where we select an invariant that is pinned to a `CastIINode` between the main and pre loop. An example is shown in the attached image. 5913 SubI is found as an invariant with the improved recursive search enabled by JDK-8249749. The control of 5913 SubI (with `get_ctrl`) is 5298 CastII. The problem is now that we use the invariant 5913 SubI in the pre loop limit check of 5281 CountedLoopEnd (done in `SuperWord::align_initial_loop_index`) because we assume that since the invariant is not part of the main loop, it can float into the pre loop. But this is prevented by 5298 CastII. This results in the dominance assertion failure when checking if the earliest control of 5270 Bool in the pre loop (5297 IfTrue because of 5913 SubI that is used by 5270 Bool) dominates the LCA of 5270 Bool (the pre loop header node). My suggestion is to improve the invariant check in `SWPointer::offset_plus_k` to also check if the found invariant is not dominated by the pre loop end node. Repeated testing of the RenaissanceStressTest has not resulted in any dominance failures anymore. ![dominance_failure](https://user-images.githubusercontent.com/17833009/97696669-3752d200-1aa6-11eb-9a42-2e36550e2b8b.png) ------------- Commit messages: - 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance Changes: https://git.openjdk.java.net/jdk/pull/954/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=954&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8251925 Stats: 73 lines in 2 files changed: 42 ins; 2 del; 29 mod Patch: https://git.openjdk.java.net/jdk/pull/954.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/954/head:pull/954 PR: https://git.openjdk.java.net/jdk/pull/954 From xliu at openjdk.java.net Fri Oct 30 14:57:55 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 30 Oct 2020 14:57:55 GMT Subject: RFR: 8255562: delete UseRDPCForConstantTableBase In-Reply-To: References: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> Message-ID: On Fri, 30 Oct 2020 08:47:00 GMT, Tobias Hartmann wrote: >> UseRDPCForConstantTableBase was a SPARC-exclusive flag. Sparc has been removed >> from hotspot, so remove this flag. > > src/hotspot/share/opto/machnode.hpp line 437: > >> 435: virtual void emit(CodeBuffer& cbuf, PhaseRegAlloc* ra_) const; >> 436: virtual uint size(PhaseRegAlloc* ra_) const; >> 437: virtual bool pinned() const { return false; } > > The method can be removed because `Node::pinned` returns false. thank you to the head-up. confirmed that MachConstandBaseNode uses Node::pinned() const {return false; }, so it's safe to remove it. I will update it in next revision along with the obsolete flag. ------------- PR: https://git.openjdk.java.net/jdk/pull/949 From kvn at openjdk.java.net Fri Oct 30 17:16:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 30 Oct 2020 17:16:57 GMT Subject: RFR: 8251925: C2: RenaissanceStressTest fails with assert(!had_error): bad dominance In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 11:28:10 GMT, Christian Hagedorn wrote: > The dominance failures start to occur after the fix for [JDK-8249749](https://bugs.openjdk.java.net/browse/JDK-8249749) which enabled the method `SWPointer::scaled_iv_plus_offset` to call itself recursively and walk the graph to match more instead of stopping immediately (no recursion): > https://github.com/openjdk/jdk/commit/092389e3c91822b1e3f56f203cb7b90e84673f8e#diff-8f29dd005a0f949d108687dabb7379c73dfd85cd782da453509dc9b6cb8c9f81L3789-R3812 > > We check in `SWPointer::offset_plus_k` if a node is invariant and if so then we choose it as invariant. However, we now have cases in the Renaissance benchmarks where we select an invariant that is pinned to a `CastIINode` between the main and pre loop. An example is shown in the attached image. 5913 SubI is found as an invariant with the improved recursive search enabled by JDK-8249749. The control of 5913 SubI (with `get_ctrl`) is 5298 CastII. The problem is now that we use the invariant 5913 SubI in the pre loop limit check of 5281 CountedLoopEnd (done in `SuperWord::align_initial_loop_index`) because we assume that since the invariant is not part of the main loop, it can float into the pre loop. But this is prevented by 5298 CastII. This results in the dominance assertion failure when checking if the earliest control of 5270 Bool in the pre loop (5297 IfTrue because of 5913 SubI that is used by 5270 Bool) dominates the LCA of 5270 Bool (the pre loop header node). > > My suggestion is to improve the invariant check in `SWPointer::offset_plus_k` to also check if the found invariant is not dominated by the pre loop end node. Repeated testing of the RenaissanceStressTest has not resulted in any dominance failures anymore. > ![dominance_failure](https://user-images.githubusercontent.com/17833009/97696669-3752d200-1aa6-11eb-9a42-2e36550e2b8b.png) I would also suggest to run locally next jtreg command and compare number of created vector nodes to make sure your changes did not affect common cases: `$ jtreg -testjdk:/my_jdk/ -va -javaoptions:'-server -Xbatch -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+TraceNewVectors' -J-Djavatest.maxOutputSize=1000000 compiler/c2/cr6340864/ compiler/codegen/ compiler/loopopts/superword/ compiler/vectorization >new_vects.log` `$ grep "new Vector node:" new_vects.log|wc` src/hotspot/share/opto/superword.cpp line 3443: > 3441: assert(lp()->is_main_loop(), ""); > 3442: CountedLoopEndNode* pre_end = cached_pre_loop_end(); > 3443: assert(get_pre_loop_end(lp()), "pre loop end must still be found"); You already have similar assert check inside cached_pre_loop_end(). src/hotspot/share/opto/superword.cpp line 921: > 919: } > 920: CountedLoopEndNode* pre_end = cached_pre_loop_end(); > 921: assert(get_pre_loop_end(lp()), "pre loop end must still be found"); You already have similar assert check inside cached_pre_loop_end(). src/hotspot/share/opto/superword.hpp line 345: > 343: #ifdef ASSERT > 344: Node* pre_end = get_pre_loop_end(_lp); > 345: assert(_lp != NULL && (pre_end == NULL || pre_end == _cached_pre_loop_end) , "real CLE either not found anymore (NULL) or unchanged"); Check _lp != NULL should be before it use in previous line. src/hotspot/share/opto/superword.hpp line 346: > 344: Node* pre_end = get_pre_loop_end(_lp); > 345: assert(_lp != NULL && (pre_end == NULL || pre_end == _cached_pre_loop_end) , "real CLE either not found anymore (NULL) or unchanged"); > 346: assert(_cached_pre_loop_end != NULL, "should be set when fetched"); This should be very first assert in this method. src/hotspot/share/opto/superword.hpp line 354: > 352: } > 353: > 354: int iv_stride() { return lp()->stride_con(); } Can you move this method right after set_lp() as originally? src/hotspot/share/opto/superword.cpp line 3801: > 3799: } > 3800: > 3801: bool SWPointer::invariant_not_dominated_by_pre_loop_end(Node* n) { I think we should have only one invariant() method which does this additional check. And have separate method is_main_loop_member() where currently we check !invariant(). src/hotspot/share/opto/superword.cpp line 3810: > 3808: // This happens, for example, when n_c is a CastII node that prevents data nodes to flow above the main loop and into > 3809: // the pre loop. Use the cached version as the real pre loop end might not be found anymore with get_pre_loop_end(). > 3810: return !phase()->is_dominator(_slp->cached_pre_loop_end(), n_c); I think it is not enough. We don't want invariant be inside pre-loop. Invariant should be node outside original loop (before splitting and unrolling). We should check that n_c dominates pre-loop head. What do you think? ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/954 From burban at openjdk.java.net Fri Oct 30 22:16:09 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Fri, 30 Oct 2020 22:16:09 GMT Subject: RFR: 8255703: jaotc: Add Windows+Arm64 support Message-ID: Quite bad timing given https://github.com/openjdk/jdk/pull/960 is happening, but I'm gonna publish those changes anyway. Tests aren't passing yet, that's the current result of `test/hotspot/jtreg:tier1_compiler_aot_jvmci`: Test results: passed: 120; failed: 22; error: 8 ------------- Commit messages: - Graal: Emit clean branch immediate in HotSpotConstantRetrievalOp - jaotc: add Windows+Arm64 support Changes: https://git.openjdk.java.net/jdk/pull/972/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=972&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255703 Stats: 337 lines in 9 files changed: 258 ins; 70 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/972.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/972/head:pull/972 PR: https://git.openjdk.java.net/jdk/pull/972 From burban at openjdk.java.net Fri Oct 30 22:16:09 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Fri, 30 Oct 2020 22:16:09 GMT Subject: RFR: 8255703: jaotc: Add Windows+Arm64 support In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 22:07:34 GMT, Bernhard Urban-Forster wrote: > Quite bad timing given https://github.com/openjdk/jdk/pull/960 is happening, but I'm gonna publish those changes anyway. > > Tests aren't passing yet, that's the current result of `test/hotspot/jtreg:tier1_compiler_aot_jvmci`: > Test results: passed: 120; failed: 22; error: 8 src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.aarch64/src/org/graalvm/compiler/hotspot/aarch64/AArch64HotSpotConstantRetrievalOp.java line 119: > 117: > 118: final int before = masm.position(); > 119: masm.bl(0); results of `test/hotspot/jtreg:tier1_compiler_aot_jvmci` before this change: Test results: passed: 102; failed: 40; error: 8 after: Test results: passed: 120; failed: 22; error: 8 I had quite some fun tracking this down, here are some more details: https://gist.github.com/lewurm/bdc85a15a9e735ac16aa23c1a6d0c8c7 ------------- PR: https://git.openjdk.java.net/jdk/pull/972 From kvn at openjdk.java.net Fri Oct 30 22:41:55 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 30 Oct 2020 22:41:55 GMT Subject: RFR: 8255703: jaotc: Add Windows+Arm64 support In-Reply-To: References: Message-ID: <0kuY-CWrhC0HciZuOkVAxe-Ixai1PP2rFF-GY1ovjFw=.86cdaaa3-fe43-4e69-ad35-f43d15c933c0@github.com> On Fri, 30 Oct 2020 22:07:34 GMT, Bernhard Urban-Forster wrote: > Quite bad timing given https://github.com/openjdk/jdk/pull/960 is happening, but I'm gonna publish those changes anyway. > > Tests aren't passing yet, that's the current result of `test/hotspot/jtreg:tier1_compiler_aot_jvmci`: > Test results: passed: 120; failed: 22; error: 8 > > Depends on https://github.com/openjdk/jdk/pull/685 I understand that you spent time working on it and don't want throw it out. But it is dead end. JAOTC and Graal sources in JDK may go away soon. I strongly suggest to withdraw this PR. You can contribute Graal (jdk.internal.vm.compiler) changes to GraalVM. ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/972 From github.com+670087+jrziviani at openjdk.java.net Sat Oct 31 03:53:01 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Sat, 31 Oct 2020 03:53:01 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v2] In-Reply-To: References: Message-ID: > - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. > - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. > Ref: PowerISA 3.1, page 129. > > These instructions are particularly interesting to improve the following > pattern `(src1src2)? 1: 0)`, which can be found in > `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. > > Long.toString, that generate such pattern in getChars, has showed a > good performance gain by using these new instructions. > > Example: > for (int i = 0; i < 200_000; i++) > res = Long.toString((long)i); > > java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString > > Without setbc (average): 0.1178 seconds > With setbc (average): 0.0396 seconds Ziviani has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - 8255553: [PPC64] Exploit branchless comparison in C2 - 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. Ref: PowerISA 3.1, page 129. These instructions are particularly interesting to improve the following pattern `(src1src2)? 1: 0)`, which can be found in `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. Long.toString, that generate such pattern in getChars, has showed a good performance gain by using these new instructions. Example: for (int i = 0; i < 200_000; i++) res = Long.toString((long)i); java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString Without setbc (average): 0.1178 seconds With setbc (average): 0.0396 seconds ------------- Changes: https://git.openjdk.java.net/jdk/pull/907/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=01 Stats: 145 lines in 3 files changed: 43 ins; 73 del; 29 mod Patch: https://git.openjdk.java.net/jdk/pull/907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/907/head:pull/907 PR: https://git.openjdk.java.net/jdk/pull/907 From github.com+670087+jrziviani at openjdk.java.net Sat Oct 31 03:53:02 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Sat, 31 Oct 2020 03:53:02 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v2] In-Reply-To: References: Message-ID: <9WvlyTQKeSt1xwubpXmYX3aFA1SrPAW7FbVCeNkavq8=.b2125f9f-704d-4d59-90a7-715f4cbf2caf@github.com> On Fri, 30 Oct 2020 10:12:51 GMT, Martin Doerr wrote: >> Ziviani has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - 8255553: [PPC64] Exploit branchless comparison in C2 >> - 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions >> >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > src/hotspot/cpu/ppc/ppc.ad line 11521: > >> 11519: __ setbc(R0, (($crx$$reg << 2) | 1) /* greater than */); >> 11520: __ setnbc($dst$$Register, ($crx$$reg << 2) /* less than */); >> 11521: __ or_unchecked($dst$$Register, $dst$$Register, R0); > > In general, I think it'd be better to use orr which makes sure we never unintentionally emit an instruction which modifies smt priority "smt_prio_...". In this case dst != R0, so this doesn't happen. Fixed! ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From github.com+670087+jrziviani at openjdk.java.net Sat Oct 31 04:01:54 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Sat, 31 Oct 2020 04:01:54 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v2] In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 10:19:03 GMT, Martin Doerr wrote: >> Ziviani has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - 8255553: [PPC64] Exploit branchless comparison in C2 >> - 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions >> >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > If you just want to add C1 code, I suggest to do that in this PR. > It's up to you if you prefer a new PR for further changes. I still think cleaning up the ad file a bit and using branch free code for all Power processors would be nice. I'd also be ok with replacing the "expand" code by assembly code directly in "ins_encode". Should be good enough for modern out-of-order processors. Hello Martin, I added a new [commit](https://github.com/openjdk/jdk/pull/907/commits/fed1b46bb776d0bd63a9a4b1b9d55d248d3fecfc) here just to make sure I'm following your idea. About C1, my experiences didn't make much difference. I think I need to write a better load to stress the code. ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From xliu at openjdk.java.net Sat Oct 31 04:42:08 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 31 Oct 2020 04:42:08 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() Message-ID: 8254807: Optimize startsWith() for String.substring() ------------- Commit messages: - fix a regression test on x86_32 - 8254807: Optimize startsWith() for String.substring() - 8254807: Optimize startsWith() for String.substring() - 8254807: Optimize startsWith() for String.substring() - 8254807: Optimize startsWith() for String.substring() - 8254807: Optimize startsWith() for String.substring() - 8254807: Optimize startsWith() for String.substring() Changes: https://git.openjdk.java.net/jdk/pull/704/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=704&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254807 Stats: 538 lines in 15 files changed: 472 ins; 56 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/704.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/704/head:pull/704 PR: https://git.openjdk.java.net/jdk/pull/704 From xliu at openjdk.java.net Sat Oct 31 04:42:08 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 31 Oct 2020 04:42:08 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() In-Reply-To: References: Message-ID: On Fri, 16 Oct 2020 17:15:16 GMT, Xin Liu wrote: > 8254807: Optimize startsWith() for String.substring() I am fixing 4 regressions. 2 of them are reproducible on my linux host. both crash in loopopt. it seems that substring optimization end up some broken ideal graph. ------------- PR: https://git.openjdk.java.net/jdk/pull/704 From xliu at openjdk.java.net Sat Oct 31 04:54:54 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 31 Oct 2020 04:54:54 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() In-Reply-To: References: Message-ID: On Tue, 20 Oct 2020 08:26:54 GMT, Xin Liu wrote: >> 8254807: Optimize startsWith() for String.substring() > > I am fixing 4 regressions. 2 of them are reproducible on my linux host. > both crash in loopopt. it seems that substring optimization end up some broken ideal graph. I add a microbenchmark. currently, api-level ------------- PR: https://git.openjdk.java.net/jdk/pull/704 From xliu at openjdk.java.net Sat Oct 31 04:54:55 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 31 Oct 2020 04:54:55 GMT Subject: Withdrawn: 8254807: Optimize startsWith() for String.substring() In-Reply-To: References: Message-ID: On Fri, 16 Oct 2020 17:15:16 GMT, Xin Liu wrote: > 8254807: Optimize startsWith() for String.substring() This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/704 From github.com+670087+jrziviani at openjdk.java.net Sat Oct 31 05:02:06 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Sat, 31 Oct 2020 05:02:06 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v3] In-Reply-To: References: Message-ID: > - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. > - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. > Ref: PowerISA 3.1, page 129. > > These instructions are particularly interesting to improve the following > pattern `(src1src2)? 1: 0)`, which can be found in > `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. > > Long.toString, that generate such pattern in getChars, has showed a > good performance gain by using these new instructions. > > Example: > for (int i = 0; i < 200_000; i++) > res = Long.toString((long)i); > > java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString > > Without setbc (average): 0.1178 seconds > With setbc (average): 0.0396 seconds Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8255553: [PPC64] Exploit branchless comparison in C2 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/907/files - new: https://git.openjdk.java.net/jdk/pull/907/files/fed1b46b..41502730 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=01-02 Stats: 35 lines in 3 files changed: 18 ins; 3 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/907/head:pull/907 PR: https://git.openjdk.java.net/jdk/pull/907 From xliu at openjdk.java.net Sat Oct 31 05:28:07 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 31 Oct 2020 05:28:07 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() Message-ID: the optimization transforms code from s=substring(base, beg, end); s.startsWith(prefix) to substring(base, beg, end) | base.startsWith(prefix, beg). it reduces an use of substring. hopefully c2 optimizer can remove the used substring() ------------- Commit messages: - fix a regression test on x86_32 - 8254807: Optimize startsWith() for String.substring() - 8254807: Optimize startsWith() for String.substring() - 8254807: Optimize startsWith() for String.substring() - 8254807: Optimize startsWith() for String.substring() - 8254807: Optimize startsWith() for String.substring() - 8254807: Optimize startsWith() for String.substring() Changes: https://git.openjdk.java.net/jdk/pull/974/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=974&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254807 Stats: 538 lines in 15 files changed: 472 ins; 56 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/974.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/974/head:pull/974 PR: https://git.openjdk.java.net/jdk/pull/974 From xliu at openjdk.java.net Sat Oct 31 05:30:55 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 31 Oct 2020 05:30:55 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() In-Reply-To: References: Message-ID: On Sat, 31 Oct 2020 05:22:39 GMT, Xin Liu wrote: > the optimization transforms code from s=substring(base, beg, end); s.startsWith(prefix) > to substring(base, beg, end) | base.startsWith(prefix, beg). > > it reduces an use of substring. hopefully c2 optimizer can remove the used substring() $make test TEST="micro:SubstringAndStartsWith" CONF=linux-x86_64-server-release Benchmark (substrLength) Mode Cnt Score Error Units SubstringAndStartsWith.substr2StartsWith 1 thrpt 100 140630.609 ? 9.404 ops/ms SubstringAndStartsWith.substr2StartsWith 8 thrpt 100 140634.988 ? 7.704 ops/ms SubstringAndStartsWith.substr2StartsWith 32 thrpt 100 135812.111 ? 4687.410 ops/ms SubstringAndStartsWith.substr2StartsWith 128 thrpt 100 140644.499 ? 7.856 ops/ms SubstringAndStartsWith.substr2StartsWith 256 thrpt 100 140640.309 ? 8.535 ops/ms SubstringAndStartsWith.substr2StartsWith 512 thrpt 100 140643.976 ? 8.168 ops/ms SubstringAndStartsWith.substr2StartsWith_noalloc 1 thrpt 100 158672.117 ? 8.510 ops/ms SubstringAndStartsWith.substr2StartsWith_noalloc 8 thrpt 100 158673.989 ? 10.198 ops/ms SubstringAndStartsWith.substr2StartsWith_noalloc 32 thrpt 100 158677.660 ? 8.987 ops/ms SubstringAndStartsWith.substr2StartsWith_noalloc 128 thrpt 100 153677.095 ? 5418.718 ops/ms SubstringAndStartsWith.substr2StartsWith_noalloc 256 thrpt 100 158668.766 ? 9.138 ops/ms SubstringAndStartsWith.substr2StartsWith_noalloc 512 thrpt 100 152544.827 ? 5922.157 ops/ms Finished running test 'micro:SubstringAndStartsWith' Test report is stored in build/linux-x86_64-server-release/test-results/micro_SubstringAndStartsWith The micro demonstrates the optimization removes substring() and the throughput is constant value. it's still slight lower than handcrafted version `substr2StartsWith_noalloc` because the later doesn't need to check complete bound check. ------------- PR: https://git.openjdk.java.net/jdk/pull/974 From redestad at openjdk.java.net Sat Oct 31 15:01:54 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Sat, 31 Oct 2020 15:01:54 GMT Subject: RFR: 8254807: Optimize startsWith() for String.substring() In-Reply-To: References: Message-ID: On Sat, 31 Oct 2020 05:22:39 GMT, Xin Liu wrote: > The optimization transforms code from s=substring(base, beg, end); s.startsWith(prefix) > to substring(base, beg, end) | base.startsWith(prefix, beg). > > it reduces uses of substring. hopefully c2 optimizer can remove the used substring. Some comments and nits on the microbenchmark. A general comment is that I think it would be good to add variants exercising UTF16 Strings: one where `sample` has some UTF-16 chars, and one where both `sample` and `prefix` do (latin-1 `sample` and UTF-16 `prefix` could be interesting too, to ensure this variant shortcuts quickly). Should the `prefix` be something a bit more complex than a single char string? `startsWith("a", off)` is a case that'd be tempting to optimize down to `charAt(off) == 'a'` and then this micro might no longer do what it intends to do. test/micro/org/openjdk/bench/vm/compiler/SubstringAndStartsWith.java line 44: > 42: @Measurement(iterations = 20, time = 200, timeUnit = TimeUnit.MILLISECONDS) > 43: @State(Scope.Benchmark) > 44: public class SubstringAndStartsWith { I'd put this micro in org.openjdk.bench.java.lang and call it SubstringStartsWith test/micro/org/openjdk/bench/vm/compiler/SubstringAndStartsWith.java line 45: > 43: @State(Scope.Benchmark) > 44: public class SubstringAndStartsWith { > 45: @Param({"1", "8", "32", "128", "256", "512"}) Does each param value pull its weight? I'd consider cutting down the default list to 2-3 variants (you can always specify more values on the command line et.c) test/micro/org/openjdk/bench/vm/compiler/SubstringAndStartsWith.java line 68: > 66: // compare prefix length with the length of substring > 67: if (prefix.length() > substrLength) return false; > 68: return sample.startsWith(prefix, substrLength); // substrLength here is actually the beginIdex of substring Suggestion: return sample.startsWith(prefix, substrLength); // substrLength here is actually the beginIndex of substring ------------- Changes requested by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/974