From neliasso at openjdk.java.net Fri Oct 1 07:37:31 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 1 Oct 2021 07:37:31 GMT Subject: RFR: 8271459: C2: Missing NegativeArraySizeException when creating StringBuilder with negative capacity [v2] In-Reply-To: References: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> Message-ID: On Mon, 27 Sep 2021 07:37:34 GMT, Christian Hagedorn wrote: >> Stringopts does not take into account that a negative `int` argument for `StringBuilder(int)` results in a `NegativeArraySizeException` when optimizing away `StringBuilder` usages into single strings. >> >> The suggested fix does the following: >> - Bailout of Stringopts if C2 knows that an `int` argument is always negative. >> - Apply stringopts but insert an additional runtime check with an UCT if C2 cannot tell if an `int` argument is positive or negative. >> >> I added some IR tests to verify the fix and also ran some standard benchmarks. >> >> I also updated `TestIRMatching` to test the new and updated default regexes. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update UCT to Action_maybe_recompile Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5652 From chagedorn at openjdk.java.net Fri Oct 1 07:37:32 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 1 Oct 2021 07:37:32 GMT Subject: RFR: 8271459: C2: Missing NegativeArraySizeException when creating StringBuilder with negative capacity [v2] In-Reply-To: References: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> Message-ID: On Mon, 27 Sep 2021 07:37:34 GMT, Christian Hagedorn wrote: >> Stringopts does not take into account that a negative `int` argument for `StringBuilder(int)` results in a `NegativeArraySizeException` when optimizing away `StringBuilder` usages into single strings. >> >> The suggested fix does the following: >> - Bailout of Stringopts if C2 knows that an `int` argument is always negative. >> - Apply stringopts but insert an additional runtime check with an UCT if C2 cannot tell if an `int` argument is positive or negative. >> >> I added some IR tests to verify the fix and also ran some standard benchmarks. >> >> I also updated `TestIRMatching` to test the new and updated default regexes. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update UCT to Action_maybe_recompile Thanks Nils for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5652 From github.com+2249648+johntortugo at openjdk.java.net Sat Oct 2 00:14:43 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Sat, 2 Oct 2021 00:14:43 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: References: <8Ce6bZtHwGEw8_wXZz4ak3obprd1YmZDi4cItcXB4bA=.a7162709-7aad-4709-a585-d2391392f49b@github.com> Message-ID: <841LgVYYimcsO_aZRCDU_6EBvNVXXeGbVkGfJ6gAQrg=.5978eea7-4753-465d-a627-ec1625a504ce@github.com> On Thu, 16 Sep 2021 07:40:58 GMT, Christian Hagedorn wrote: >> John Tortugo has updated the pull request incrementally with 146 additional commits since the last revision: >> >> - Fix merge mistake. >> - Merge branch 'jdk-8267265' of https://github.com/JohnTortugo/jdk into jdk-8267265 >> - Addressing PR feedback: move tests to other directory, add custom tests, add tests for other optimizations, rename some tests. >> - 8273197: ProblemList 2 jtools tests due to JDK-8273187 >> 8273198: ProblemList java/lang/instrument/BootClassPath/BootClassPathTest.sh due to JDK-8273188 >> >> Reviewed-by: naoto >> - 8262186: Call X509KeyManager.chooseClientAlias once for all key types >> >> Reviewed-by: xuelei >> - 8273186: Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet >> >> Reviewed-by: ayang >> - 8273169: java/util/regex/NegativeArraySize.java failed after JDK-8271302 >> >> Reviewed-by: jiefu, serb >> - 8273092: Sort classlist in JDK image >> >> Reviewed-by: redestad, ihse, dfuchs >> - 8273144: Remove unused top level "Sample Collection Set Candidates" logging >> >> Reviewed-by: iwalulya, ayang >> - 8262095: NPE in Flow$FlowAnalyzer.visitApply: Cannot invoke getThrownTypes because tree.meth.type is null >> >> Co-authored-by: Jan Lahoda >> Co-authored-by: Vicente Romero >> Reviewed-by: jlahoda >> - ... and 136 more: https://git.openjdk.java.net/jdk/compare/ac430bf7...463102e2 > > test/hotspot/jtreg/compiler/c2/irTests/AddINodeIdealizationTests.java line 40: > >> 38: >> 39: @Test >> 40: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.SUB}) > > In this test and all the following ones (including the other files), I think you can remove unrelated `failOn` regexes on operations that are not part of the test. For example, in this test you can safely remove `IRNode.MUL, DIV, and SUB`. Do you think I can remove the "LOAD" and "STORE" as well? ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From jiefu at openjdk.java.net Sat Oct 2 03:10:31 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 2 Oct 2021 03:10:31 GMT Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern In-Reply-To: References: <4RQ7uX1dWVQYPu8WFnqt0bx4nFoq3ijIUQ6WMSP-INc=.e1274b05-a6e9-4227-9588-08310c3cffac@github.com> Message-ID: On Thu, 30 Sep 2021 23:41:20 GMT, Jie Fu wrote: > > I will do your experiment next week. This is because it's already our National Day week and I can't find an English Windows machine until next week. I'll let you know the result as soon as possible. Thanks. > > No need to hurry :-). In case you can't find an English Windows, I think you can use the `chcp 65001` command mentioned in https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line to change your command-line window to use the UTF8 codepage. Hi @iklam , methodMatcher.obj [1] built with `System Locale: zh-cn;Chinese (China)` methodMatcher.obj [2] built with `System Locale: en-us;English (United States)"` There seems no difference when checking with `strings methodMatcher.obj`. The warnings disappear when the system locale is `en-us;English (United States)`. But unfortunately, I can't reproduce the "CJK" test example, which means non-ASCII chars for CompileCommand still fail for both jdk images (even when built with en-us locale, no warnings at all). So it's far more complicated than I had thought. I will just close this pr since we can't remove the non-ASCII code, which works in some countries. Thank you all for your help and valuable comments. Best regards, Jie [1] https://github.com/DamonFool/experiment/blob/main/JDK-8274329/ch-methodMatcher.obj [2] https://github.com/DamonFool/experiment/blob/main/JDK-8274329/en-methodMatcher.obj ------------- PR: https://git.openjdk.java.net/jdk/pull/5704 From simonis at openjdk.java.net Mon Oct 4 07:51:27 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 4 Oct 2021 07:51:27 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v3] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Fix special case where we're creating an implicit exception for a regular invoke* bytecode - Minor updates as requested by @TheRealMDoerr - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow ------------- Changes: https://git.openjdk.java.net/jdk/pull/5488/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=02 Stats: 208 lines in 10 files changed: 200 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From chagedorn at openjdk.java.net Mon Oct 4 08:38:10 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 4 Oct 2021 08:38:10 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: <841LgVYYimcsO_aZRCDU_6EBvNVXXeGbVkGfJ6gAQrg=.5978eea7-4753-465d-a627-ec1625a504ce@github.com> References: <8Ce6bZtHwGEw8_wXZz4ak3obprd1YmZDi4cItcXB4bA=.a7162709-7aad-4709-a585-d2391392f49b@github.com> <841LgVYYimcsO_aZRCDU_6EBvNVXXeGbVkGfJ6gAQrg=.5978eea7-4753-465d-a627-ec1625a504ce@github.com> Message-ID: <5ELOnKEloqyJk_gHgnvU2zJWDhxtjkByCQZgPLCHptA=.f26a20ac-fc1f-4a58-91dd-7a02994a5e7d@github.com> On Sat, 2 Oct 2021 00:11:47 GMT, John Tortugo wrote: >> test/hotspot/jtreg/compiler/c2/irTests/AddINodeIdealizationTests.java line 40: >> >>> 38: >>> 39: @Test >>> 40: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.SUB}) >> >> In this test and all the following ones (including the other files), I think you can remove unrelated `failOn` regexes on operations that are not part of the test. For example, in this test you can safely remove `IRNode.MUL, DIV, and SUB`. > > Do you think I can remove the "LOAD" and "STORE" as well? Yes, I think you can remove them, too, as long as there are no fields involved. ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From iveresov at openjdk.java.net Tue Oct 5 03:30:18 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Tue, 5 Oct 2021 03:30:18 GMT Subject: RFR: 8273612: Fix for JDK-8272873 causes timeout in running some tests with -Xcomp Message-ID: With tiered it just so happened that profiles are reported as mature with -Xcomp. For some tests this leads to pathological profiles that causes excessive execution time. The fix it make profiles immature when running with -Xcomp. ------------- Commit messages: - Fix the tests - Report profiles as immature with -Xcomp Changes: https://git.openjdk.java.net/jdk/pull/5815/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5815&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273612 Stats: 9 lines in 3 files changed: 4 ins; 2 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5815.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5815/head:pull/5815 PR: https://git.openjdk.java.net/jdk/pull/5815 From iklam at openjdk.java.net Tue Oct 5 06:41:08 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 5 Oct 2021 06:41:08 GMT Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern In-Reply-To: References: Message-ID: On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu wrote: > Hi all, > > I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). > However, I failed with C4474 and C4778 warnings as below: > > Compiling 100 properties into resource bundles for java.desktop > Compiling 3038 files for java.base > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error C2220: the following warning is treated as an error > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: placeholders and their parameters expect 1 variadic arguments, but 3 were provided > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4778: 'sscanf' : unterminated format string '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: placeholders and their parameters expect 0 variadic arguments, but 2 were provided > > > The failure is caused by non-ASCII chars in the format string of sscanf [1][2], which is non-portable on our Windows platform. > In fact, these non-ASCII coding also triggers C4819 warning, which had been disabled in JDK-8216154 [3]. > And I also found an article showing that sscanf may fail with non-ASCII in the format string [4]. > > So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). > And I think it's safe to do so. > > This is because: > 1) There are actually no non-ASCII chars for package/class/method/signature names. > 2) I don't think there is a use case, in which people will input non-ASCII for `CompileCommand`. > > You may argue that the non-ASCII may be used by the parser itself. > But I didn't find that usage at all. (Please let me know if I miss something.) > > So I suggest to remove these non-ASCII code to make HotSpot to be more portable. > And if we do so, we can also remove the only one `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. > > Testing: > - Build tests on Windows > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 > [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html > [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ > [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 My experiments above with ` -XX:CompileCommand='compileonly,*::??'` was done on Linux. I tried doing the same on Windows. On US-English Windows, the default codepage is 437 (DOS Latin US). If I change it to 65001 (UTF8) then Java is able to output CJK characters to the console. public class CJK { public static void main(String args[]) { System.out.println(args[0]); \u722a\u54c7(); } static void \u722a\u54c7() { // Chinese word for "Java" Thread.dumpStack(); } } c:\ade>chcp Active code page: 437 c:\ade>jdk-17\bin\java -cp . CJK 123 123 java.lang.Exception: Stack trace at java.base/java.lang.Thread.dumpStack(Thread.java:1380) at CJK.??(CJK.java:8) at CJK.main(CJK.java:4) c:\ade>chcp 65001 Active code page: 65001 c:\ade>jdk-17\bin\java -cp . CJK ?? ?? java.lang.Exception: Stack trace at java.base/java.lang.Thread.dumpStack(Thread.java:1380) at CJK.??(CJK.java:8) at CJK.main(CJK.java:4) As you can see, the CJK characters in the command-line arguments can't even be correctly passed as arguments to the Java main class. If that doesn't work, I can't see how we can get `-XX:CompileCommand` to work with CJK characters. ------------- PR: https://git.openjdk.java.net/jdk/pull/5704 From chagedorn at openjdk.java.net Tue Oct 5 07:06:15 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 5 Oct 2021 07:06:15 GMT Subject: Integrated: 8271459: C2: Missing NegativeArraySizeException when creating StringBuilder with negative capacity In-Reply-To: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> References: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> Message-ID: On Thu, 23 Sep 2021 13:25:51 GMT, Christian Hagedorn wrote: > Stringopts does not take into account that a negative `int` argument for `StringBuilder(int)` results in a `NegativeArraySizeException` when optimizing away `StringBuilder` usages into single strings. > > The suggested fix does the following: > - Bailout of Stringopts if C2 knows that an `int` argument is always negative. > - Apply stringopts but insert an additional runtime check with an UCT if C2 cannot tell if an `int` argument is positive or negative. > > I added some IR tests to verify the fix and also ran some standard benchmarks. > > I also updated `TestIRMatching` to test the new and updated default regexes. > > Thanks, > Christian This pull request has now been integrated. Changeset: 3953e077 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/3953e0774c59c5e936e752aa08b6b6778e232994 Stats: 276 lines in 4 files changed: 269 ins; 0 del; 7 mod 8271459: C2: Missing NegativeArraySizeException when creating StringBuilder with negative capacity Reviewed-by: roland, thartmann, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/5652 From magnus.ihse.bursie at oracle.com Tue Oct 5 08:09:46 2021 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 5 Oct 2021 10:09:46 +0200 Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern In-Reply-To: References: Message-ID: On 2021-10-05 08:41, Ioi Lam wrote: > As you can see, the CJK characters in the command-line arguments can't > even be correctly passed as arguments to the Java main class. If that > doesn't work, I can't see how we can get `-XX:CompileCommand` to work > with CJK characters. So, what does that mean? That we should explicitly limit `-XX:CompileCommand`to work with ASCII-only arguments? I accept that we might not get all characters to work in all circumstances due to limitations in Windows, but the current state of affairs still feel unsatisfactory. We should at least have a better failure mode, and document any limitations. /Magnus From aph at openjdk.java.net Tue Oct 5 10:05:24 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 5 Oct 2021 10:05:24 GMT Subject: RFR: 8274730: AArch64: AES/GCM acceleration is broken by the fix for JDK-8273297 Message-ID: The recent AES/GCM acceleration on AArch64 was broken by https://bugs.openjdk.java.net/browse/JDK-8273297 . This was entirely expected, and I approved the patch, but now we must make AArch64 acceleration work again. The only significant change from the point of view of this patch is that one argument was added to the call to the intrinsic, and that argument caused another argument to spill onto the stack. ------------- Commit messages: - 8274730: AArch64: AES/GCM acceleration is broken by the fix for JDK-8273297 Changes: https://git.openjdk.java.net/jdk/pull/5819/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5819&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274730 Stats: 20 lines in 3 files changed: 12 ins; 1 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/5819.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5819/head:pull/5819 PR: https://git.openjdk.java.net/jdk/pull/5819 From adinn at openjdk.java.net Tue Oct 5 11:10:06 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Tue, 5 Oct 2021 11:10:06 GMT Subject: RFR: 8274730: AArch64: AES/GCM acceleration is broken by the fix for JDK-8273297 In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 09:58:38 GMT, Andrew Haley wrote: > The recent AES/GCM acceleration on AArch64 was broken by https://bugs.openjdk.java.net/browse/JDK-8273297 . This was entirely expected, and I approved the patch, but now we must make AArch64 acceleration work again. > The only significant change from the point of view of this patch is that one argument was added to the call to the intrinsic, and that argument caused another argument to spill onto the stack. Changes look good. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5819 From mdoerr at openjdk.java.net Tue Oct 5 12:35:22 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 5 Oct 2021 12:35:22 GMT Subject: RFR: 8274773: [TESTBUG] UnsafeIntrinsicsTest intermittently fails on weak memory model platform Message-ID: The test creates new Nodes and publishes them to concurrent readers. This requires at least release and load_consume. A clean fix would be to make `Node.next` volatile. But that would be a sledgehammer. A minimalistic fix for our supported weak memory model platforms is to insert a `storeFence`. What is better? (See JBS for failure description.) ------------- Commit messages: - 8274773: [TESTBUG] UnsafeIntrinsicsTest intermittently fails on weak memory model platform Changes: https://git.openjdk.java.net/jdk/pull/5823/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5823&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274773 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5823.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5823/head:pull/5823 PR: https://git.openjdk.java.net/jdk/pull/5823 From aph-open at littlepinkcloud.com Tue Oct 5 13:55:07 2021 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Tue, 5 Oct 2021 14:55:07 +0100 Subject: Please have a quick look at 8274730: AArch64: AES/GCM acceleration is broken by the fix for JDK-8273297 Message-ID: <933c7ca3-2168-74a5-9f3d-68202005d1ec@littlepinkcloud.com> I had to touch common code in C2 to unbreak AArch64, so I'd like another reviewer, please. It's pretty simple, thanks. https://github.com/openjdk/jdk/pull/5819 -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From roland at openjdk.java.net Tue Oct 5 14:02:08 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 5 Oct 2021 14:02:08 GMT Subject: RFR: 8274730: AArch64: AES/GCM acceleration is broken by the fix for JDK-8273297 In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 09:58:38 GMT, Andrew Haley wrote: > The recent AES/GCM acceleration on AArch64 was broken by https://bugs.openjdk.java.net/browse/JDK-8273297 . This was entirely expected, and I approved the patch, but now we must make AArch64 acceleration work again. > The only significant change from the point of view of this patch is that one argument was added to the call to the intrinsic, and that argument caused another argument to spill onto the stack. c2 change looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5819 From mdoerr at openjdk.java.net Tue Oct 5 15:39:08 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 5 Oct 2021 15:39:08 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v3] In-Reply-To: References: Message-ID: On Mon, 4 Oct 2021 07:51:27 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Fix special case where we're creating an implicit exception for a regular invoke* bytecode > - Minor updates as requested by @TheRealMDoerr > - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow Thanks for the update! New version looks good to me and our tests with disabled OmitStackTraceInFastThrow have passed. I think you should choose a couple of jtreg tests and add an additional run with OmitStackTraceInFastThrow disabled. What do you think? ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From iveresov at openjdk.java.net Tue Oct 5 16:06:28 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Tue, 5 Oct 2021 16:06:28 GMT Subject: RFR: 8273612: Fix for JDK-8272873 causes timeout in running some tests with -Xcomp [v2] In-Reply-To: References: Message-ID: > With tiered it just so happened that profiles are reported as mature with -Xcomp. For some tests this leads to pathological profiles that causes excessive execution time. The fix it make profiles immature when running with -Xcomp. Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Fix the tests - Report profiles as immature with -Xcomp ------------- Changes: https://git.openjdk.java.net/jdk/pull/5815/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5815&range=01 Stats: 9 lines in 3 files changed: 4 ins; 2 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5815.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5815/head:pull/5815 PR: https://git.openjdk.java.net/jdk/pull/5815 From kvn at openjdk.java.net Tue Oct 5 16:14:11 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 5 Oct 2021 16:14:11 GMT Subject: RFR: 8274730: AArch64: AES/GCM acceleration is broken by the fix for JDK-8273297 In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 09:58:38 GMT, Andrew Haley wrote: > The recent AES/GCM acceleration on AArch64 was broken by https://bugs.openjdk.java.net/browse/JDK-8273297 . This was entirely expected, and I approved the patch, but now we must make AArch64 acceleration work again. > The only significant change from the point of view of this patch is that one argument was added to the call to the intrinsic, and that argument caused another argument to spill onto the stack. Looks good with just 2 small comments. I will test changes and approve it after that. src/hotspot/share/opto/library_call.cpp line 6800: > 6798: if (Matcher::htbl_entries == -1) return false; > 6799: > 6800: // new array to hold 48 computed htbl entries Move this comment under `(Matcher::htbl_entries != 0)` check. src/hotspot/share/opto/library_call.cpp line 6806: > 6804: if (subkeyHtbl_48_entries == NULL) return false; > 6805: subkeyHtbl_48_entries_start > 6806: = array_element_address(subkeyHtbl_48_entries, intcon(0), T_LONG); Please, don't split such line. ------------- PR: https://git.openjdk.java.net/jdk/pull/5819 From aph at openjdk.java.net Tue Oct 5 17:26:29 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 5 Oct 2021 17:26:29 GMT Subject: RFR: 8274730: AArch64: AES/GCM acceleration is broken by the fix for JDK-8273297 [v2] In-Reply-To: References: Message-ID: > The recent AES/GCM acceleration on AArch64 was broken by https://bugs.openjdk.java.net/browse/JDK-8273297 . This was entirely expected, and I approved the patch, but now we must make AArch64 acceleration work again. > The only significant change from the point of view of this patch is that one argument was added to the call to the intrinsic, and that argument caused another argument to spill onto the stack. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Cleanup, no functional change. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5819/files - new: https://git.openjdk.java.net/jdk/pull/5819/files/adfcc75c..89e4655b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5819&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5819&range=00-01 Stats: 4 lines in 1 file changed: 1 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5819.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5819/head:pull/5819 PR: https://git.openjdk.java.net/jdk/pull/5819 From kvn at openjdk.java.net Tue Oct 5 18:28:12 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 5 Oct 2021 18:28:12 GMT Subject: RFR: 8274730: AArch64: AES/GCM acceleration is broken by the fix for JDK-8273297 [v2] In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 17:26:29 GMT, Andrew Haley wrote: >> The recent AES/GCM acceleration on AArch64 was broken by https://bugs.openjdk.java.net/browse/JDK-8273297 . This was entirely expected, and I approved the patch, but now we must make AArch64 acceleration work again. >> The only significant change from the point of view of this patch is that one argument was added to the call to the intrinsic, and that argument caused another argument to spill onto the stack. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup, no functional change. Good. And my testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5819 From neliasso at openjdk.java.net Tue Oct 5 19:44:12 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 5 Oct 2021 19:44:12 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v3] In-Reply-To: References: Message-ID: On Mon, 4 Oct 2021 07:51:27 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Fix special case where we're creating an implicit exception for a regular invoke* bytecode > - Minor updates as requested by @TheRealMDoerr > - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow In the benchmark you are testing a case where the exception isn't used - will the allocation be eliminated then? In addition to the benchmark you have provided, I would like to see functional tests for both when the allocation is used, and when it isn't. Have you looked at the new IR Test Framework? ------------- Changes requested by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5488 From kvn at openjdk.java.net Tue Oct 5 19:47:10 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 5 Oct 2021 19:47:10 GMT Subject: RFR: 8273612: Fix for JDK-8272873 causes timeout in running some tests with -Xcomp [v2] In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 16:06:28 GMT, Igor Veresov wrote: >> With tiered it just so happened that profiles are reported as mature with -Xcomp. For some tests this leads to pathological profiles that causes excessive execution time. The fix it make profiles immature when running with -Xcomp. > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Fix the tests > - Report profiles as immature with -Xcomp Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5815 From neliasso at openjdk.java.net Tue Oct 5 19:47:10 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 5 Oct 2021 19:47:10 GMT Subject: RFR: 8273612: Fix for JDK-8272873 causes timeout in running some tests with -Xcomp [v2] In-Reply-To: References: Message-ID: <-AmXEj04As6SyEo5Pwxw60Vtv9-UbxRi5zZ-8E3Rkiw=.916136f9-199c-46fc-861a-463bf7d5e273@github.com> On Tue, 5 Oct 2021 16:06:28 GMT, Igor Veresov wrote: >> With tiered it just so happened that profiles are reported as mature with -Xcomp. For some tests this leads to pathological profiles that causes excessive execution time. The fix it make profiles immature when running with -Xcomp. > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Fix the tests > - Report profiles as immature with -Xcomp Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5815 From iveresov at openjdk.java.net Tue Oct 5 19:47:11 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Tue, 5 Oct 2021 19:47:11 GMT Subject: RFR: 8273612: Fix for JDK-8272873 causes timeout in running some tests with -Xcomp [v2] In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 16:06:28 GMT, Igor Veresov wrote: >> With tiered it just so happened that profiles are reported as mature with -Xcomp. For some tests this leads to pathological profiles that causes excessive execution time. The fix it make profiles immature when running with -Xcomp. > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Fix the tests > - Report profiles as immature with -Xcomp Thanks for the review! Thanks, Nils! ------------- PR: https://git.openjdk.java.net/jdk/pull/5815 From iveresov at openjdk.java.net Tue Oct 5 19:47:12 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Tue, 5 Oct 2021 19:47:12 GMT Subject: Integrated: 8273612: Fix for JDK-8272873 causes timeout in running some tests with -Xcomp In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 03:22:27 GMT, Igor Veresov wrote: > With tiered it just so happened that profiles are reported as mature with -Xcomp. For some tests this leads to pathological profiles that causes excessive execution time. The fix it make profiles immature when running with -Xcomp. This pull request has now been integrated. Changeset: 83b22192 Author: Igor Veresov URL: https://git.openjdk.java.net/jdk/commit/83b2219220266c1365466970d08606fef766c4fa Stats: 9 lines in 3 files changed: 4 ins; 2 del; 3 mod 8273612: Fix for JDK-8272873 causes timeout in running some tests with -Xcomp Reviewed-by: kvn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/5815 From neliasso at openjdk.java.net Tue Oct 5 19:51:09 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 5 Oct 2021 19:51:09 GMT Subject: RFR: 8274145: C2: Incorrect computation after JDK-8269752 In-Reply-To: <5B9h2DzETeX6X6cg-xH0jNgWxbFrzw4Xlfxv01pTzkA=.b19b9e3e-ef57-44f1-976a-42f0e0929b73@github.com> References: <5B9h2DzETeX6X6cg-xH0jNgWxbFrzw4Xlfxv01pTzkA=.b19b9e3e-ef57-44f1-976a-42f0e0929b73@github.com> Message-ID: On Mon, 27 Sep 2021 08:57:25 GMT, Roland Westrelin wrote: > The bug happens because an If node that follows a CountedLoop is > replaced by the CountedLoopEnd node of the main loop. Further > unrolling happens after the If is replaced which causes the condition > of the CountedLoopEnd node to change. This is made possible by > JDK-8269752. The fix I propose is to detect that corner case and > prevent the If to be replaced in that case. Looks good! Please change the bug title to something more descriptive. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5712 From github.com+2249648+johntortugo at openjdk.java.net Tue Oct 5 23:12:09 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Tue, 5 Oct 2021 23:12:09 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: References: <8Ce6bZtHwGEw8_wXZz4ak3obprd1YmZDi4cItcXB4bA=.a7162709-7aad-4709-a585-d2391392f49b@github.com> Message-ID: On Fri, 17 Sep 2021 10:23:51 GMT, Christian Hagedorn wrote: >> John Tortugo has updated the pull request incrementally with 146 additional commits since the last revision: >> >> - Fix merge mistake. >> - Merge branch 'jdk-8267265' of https://github.com/JohnTortugo/jdk into jdk-8267265 >> - Addressing PR feedback: move tests to other directory, add custom tests, add tests for other optimizations, rename some tests. >> - 8273197: ProblemList 2 jtools tests due to JDK-8273187 >> 8273198: ProblemList java/lang/instrument/BootClassPath/BootClassPathTest.sh due to JDK-8273188 >> >> Reviewed-by: naoto >> - 8262186: Call X509KeyManager.chooseClientAlias once for all key types >> >> Reviewed-by: xuelei >> - 8273186: Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet >> >> Reviewed-by: ayang >> - 8273169: java/util/regex/NegativeArraySize.java failed after JDK-8271302 >> >> Reviewed-by: jiefu, serb >> - 8273092: Sort classlist in JDK image >> >> Reviewed-by: redestad, ihse, dfuchs >> - 8273144: Remove unused top level "Sample Collection Set Candidates" logging >> >> Reviewed-by: iwalulya, ayang >> - 8262095: NPE in Flow$FlowAnalyzer.visitApply: Cannot invoke getThrownTypes because tree.meth.type is null >> >> Co-authored-by: Jan Lahoda >> Co-authored-by: Vicente Romero >> Reviewed-by: jlahoda >> - ... and 136 more: https://git.openjdk.java.net/jdk/compare/ac430bf7...463102e2 > > test/hotspot/jtreg/compiler/c2/irTests/MulINodeIdealizationTests.java line 45: > >> 43: //Checks Max(a,b) * min(a,b) => a*b >> 44: public int excludeMaxMin(int x, int y){ >> 45: return Math.max(x, y) * Math.min(x, y); > > `Math.min/max()` is intrinsified and HotSpot generates `CMove` nodes (see `LibraryCallKit::generate_min_max()`) for them. But it looks like `MulNode::Ideal` misses this check for `CMove` nodes. That could be done in a separate RFE (and then this test could be improved to check if the `CMove` node was removed). > > Anyways, min/max nodes are mainly used for loop limit computations, so it's harder to test this transformation in an easy way. Created this work item to address the suggestion: https://bugs.openjdk.java.net/browse/JDK-8274799 ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From jiefu at openjdk.java.net Wed Oct 6 02:33:30 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 6 Oct 2021 02:33:30 GMT Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v2] In-Reply-To: References: Message-ID: > Hi all, > > I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). > However, I failed with C4474 and C4778 warnings as below: > > Compiling 100 properties into resource bundles for java.desktop > Compiling 3038 files for java.base > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error C2220: the following warning is treated as an error > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: placeholders and their parameters expect 1 variadic arguments, but 3 were provided > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4778: 'sscanf' : unterminated format string '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: placeholders and their parameters expect 0 variadic arguments, but 2 were provided > > > The failure is caused by non-ASCII chars in the format string of sscanf [1][2], which is non-portable on our Windows platform. > In fact, these non-ASCII coding also triggers C4819 warning, which had been disabled in JDK-8216154 [3]. > And I also found an article showing that sscanf may fail with non-ASCII in the format string [4]. > > So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). > And I think it's safe to do so. > > This is because: > 1) There are actually no non-ASCII chars for package/class/method/signature names. > 2) I don't think there is a use case, in which people will input non-ASCII for `CompileCommand`. > > You may argue that the non-ASCII may be used by the parser itself. > But I didn't find that usage at all. (Please let me know if I miss something.) > > So I suggest to remove these non-ASCII code to make HotSpot to be more portable. > And if we do so, we can also remove the only one `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. > > Testing: > - Build tests on Windows > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 > [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html > [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ > [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Disable non-ASCII for Windows only ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5704/files - new: https://git.openjdk.java.net/jdk/pull/5704/files/d4b84f2b..e1271085 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5704&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5704&range=00-01 Stats: 30 lines in 1 file changed: 30 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5704.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5704/head:pull/5704 PR: https://git.openjdk.java.net/jdk/pull/5704 From jiefu at openjdk.java.net Wed Oct 6 02:39:05 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 6 Oct 2021 02:39:05 GMT Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern In-Reply-To: References: Message-ID: <0KVMr9z6dJBABtm8QAMoBNm_pEYWHl61ZMnI8n8WxYo=.2cafcf48-db78-4a61-b400-d5c22307bd2b@github.com> On Tue, 5 Oct 2021 06:38:05 GMT, Ioi Lam wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: placeholders and their parameters expect 1 variadic arguments, but 3 were provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4778: 'sscanf' : unterminated format string '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: placeholders and their parameters expect 0 variadic arguments, but 2 were provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature names. >> 2) I don't think there is a use case, in which people will input non-ASCII for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more portable. >> And if we do so, we can also remove the only one `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > My experiments above with ` -XX:CompileCommand='compileonly,*::??'` was done on Linux. I tried doing the same on Windows. On US-English Windows, the default codepage is 437 (DOS Latin US). If I change it to 65001 (UTF8) then Java is able to output CJK characters to the console. > > > public class CJK { > public static void main(String args[]) { > System.out.println(args[0]); > \u722a\u54c7(); > } > > static void \u722a\u54c7() { // Chinese word for "Java" > Thread.dumpStack(); > } > } > > > > c:\ade>chcp > Active code page: 437 > > c:\ade>jdk-17\bin\java -cp . CJK 123 > 123 > java.lang.Exception: Stack trace > at java.base/java.lang.Thread.dumpStack(Thread.java:1380) > at CJK.??(CJK.java:8) > at CJK.main(CJK.java:4) > > c:\ade>chcp 65001 > Active code page: 65001 > > c:\ade>jdk-17\bin\java -cp . CJK ?? > ?? > java.lang.Exception: Stack trace > at java.base/java.lang.Thread.dumpStack(Thread.java:1380) > at CJK.??(CJK.java:8) > at CJK.main(CJK.java:4) > > > As you can see, the CJK characters in the command-line arguments can't even be correctly passed as arguments to the Java main class. If that doesn't work, I can't see how we can get `-XX:CompileCommand` to work with CJK characters. Thanks @iklam and @magicus for your experiments and comments. My experiments show that CompileCommand doesn't work with non-US-English env Windows. And @iklam 's experiments show that it doesn't work with US-English env Windows either. So I suggest we disable non-ASCII chars for Windows. The patch has been updated. 1. On non-Windows platforms, CompileCommand still works as before. 2. On Windows, it will be limited to work with ASCII-only arguments. For non-ASCII chars, the parser will fail like this: ``` >java -XX:CompileCommand=compileonly,*::?? -version CompileCommand: An error occurred during parsing Error: Non-ASCII characters are not supported on Windows. Line: 'compileonly,*::??' ``` What do you think? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/5704 From iklam at openjdk.java.net Wed Oct 6 04:35:07 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 6 Oct 2021 04:35:07 GMT Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v2] In-Reply-To: References: Message-ID: <9ECYcKawGi-4z-z8pPMHpIokPABv6jr1JUwbnSyTH3Q=.df279737-30dc-473d-b460-eb841e409570@github.com> On Wed, 6 Oct 2021 02:33:30 GMT, Jie Fu wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: placeholders and their parameters expect 1 variadic arguments, but 3 were provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4778: 'sscanf' : unterminated format string '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: placeholders and their parameters expect 0 variadic arguments, but 2 were provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature names. >> 2) I don't think there is a use case, in which people will input non-ASCII for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more portable. >> And if we do so, we can also remove the only one `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Disable non-ASCII for Windows only The idea looks good to me. I just have a suggestion to make the code more readable. src/hotspot/share/compiler/methodMatcher.cpp line 77: > 75: "\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \ > 76: "\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" > 77: #endif It's hard to tell what's the difference between these two RANGEBASE definitions. How about doing it like this to make the code more readable? #define RANGEBASE_ASCII "....." #define RANGEBASE_NON_ASCII "....." #ifdef WINDOWS #define RANGEBASE RANGEBASE_ASCII #else #define RANGEBASE RANGEBASE_ASCII RANGEBASE_NON_ASCII #endif ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5704 From jiefu at openjdk.java.net Wed Oct 6 05:17:28 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 6 Oct 2021 05:17:28 GMT Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v2] In-Reply-To: <9ECYcKawGi-4z-z8pPMHpIokPABv6jr1JUwbnSyTH3Q=.df279737-30dc-473d-b460-eb841e409570@github.com> References: <9ECYcKawGi-4z-z8pPMHpIokPABv6jr1JUwbnSyTH3Q=.df279737-30dc-473d-b460-eb841e409570@github.com> Message-ID: On Wed, 6 Oct 2021 04:30:12 GMT, Ioi Lam wrote: > It's hard to tell what's the difference between these two RANGEBASE definitions. How about doing it like this to make the code more readable? > > ``` > #define RANGEBASE_ASCII "....." > #define RANGEBASE_NON_ASCII "....." > #ifdef WINDOWS > #define RANGEBASE RANGEBASE_ASCII > #else > #define RANGEBASE RANGEBASE_ASCII RANGEBASE_NON_ASCII > #endif > ``` Good suggestion! Updated. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/5704 From jiefu at openjdk.java.net Wed Oct 6 05:17:28 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 6 Oct 2021 05:17:28 GMT Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3] In-Reply-To: References: Message-ID: > Hi all, > > I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). > However, I failed with C4474 and C4778 warnings as below: > > Compiling 100 properties into resource bundles for java.desktop > Compiling 3038 files for java.base > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error C2220: the following warning is treated as an error > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: placeholders and their parameters expect 1 variadic arguments, but 3 were provided > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4778: 'sscanf' : unterminated format string '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: placeholders and their parameters expect 0 variadic arguments, but 2 were provided > > > The failure is caused by non-ASCII chars in the format string of sscanf [1][2], which is non-portable on our Windows platform. > In fact, these non-ASCII coding also triggers C4819 warning, which had been disabled in JDK-8216154 [3]. > And I also found an article showing that sscanf may fail with non-ASCII in the format string [4]. > > So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). > And I think it's safe to do so. > > This is because: > 1) There are actually no non-ASCII chars for package/class/method/signature names. > 2) I don't think there is a use case, in which people will input non-ASCII for `CompileCommand`. > > You may argue that the non-ASCII may be used by the parser itself. > But I didn't find that usage at all. (Please let me know if I miss something.) > > So I suggest to remove these non-ASCII code to make HotSpot to be more portable. > And if we do so, we can also remove the only one `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. > > Testing: > - Build tests on Windows > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 > [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html > [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ > [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5704/files - new: https://git.openjdk.java.net/jdk/pull/5704/files/e1271085..d0070680 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5704&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5704&range=01-02 Stats: 15 lines in 1 file changed: 1 ins; 9 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/5704.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5704/head:pull/5704 PR: https://git.openjdk.java.net/jdk/pull/5704 From iklam at openjdk.java.net Wed Oct 6 06:44:08 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 6 Oct 2021 06:44:08 GMT Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3] In-Reply-To: References: Message-ID: On Wed, 6 Oct 2021 05:17:28 GMT, Jie Fu wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: placeholders and their parameters expect 1 variadic arguments, but 3 were provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4778: 'sscanf' : unterminated format string '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: placeholders and their parameters expect 0 variadic arguments, but 2 were provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature names. >> 2) I don't think there is a use case, in which people will input non-ASCII for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more portable. >> And if we do so, we can also remove the only one `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII Marked as reviewed by iklam (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5704 From chagedorn at openjdk.java.net Wed Oct 6 07:23:09 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 6 Oct 2021 07:23:09 GMT Subject: RFR: 8273410: IR verification framework fails with "Should find method name in validIrRulesMap" [v2] In-Reply-To: References: Message-ID: <7mphLGbhiSb3Lt7OJDJ6tpPZbBbwXCRMIlq5itm18G8=.9f302109-d3ed-4cf6-bed9-9e49eca613a7@github.com> On Mon, 27 Sep 2021 07:41:59 GMT, Christian Hagedorn wrote: >> The IR framework treated a `@Check` method as `@Test` method instead of the `@Test` method itself at IR matching time resulting in an internal framework exception. >> >> While writing some tests for checked test I've noticed that a missing `@Arguments` annotation is not reported as `TestFormatException` but with a `RuntimeException` later when invoking the method in question. I also added the missing check for it. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add @bug Anyone for a second review? ------------- PR: https://git.openjdk.java.net/jdk/pull/5678 From neliasso at openjdk.java.net Wed Oct 6 08:10:08 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 6 Oct 2021 08:10:08 GMT Subject: RFR: 8273410: IR verification framework fails with "Should find method name in validIrRulesMap" [v2] In-Reply-To: References: Message-ID: On Mon, 27 Sep 2021 07:41:59 GMT, Christian Hagedorn wrote: >> The IR framework treated a `@Check` method as `@Test` method instead of the `@Test` method itself at IR matching time resulting in an internal framework exception. >> >> While writing some tests for checked test I've noticed that a missing `@Arguments` annotation is not reported as `TestFormatException` but with a `RuntimeException` later when invoking the method in question. I also added the missing check for it. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add @bug Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5678 From aph at openjdk.java.net Wed Oct 6 08:21:13 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 6 Oct 2021 08:21:13 GMT Subject: Integrated: 8274730: AArch64: AES/GCM acceleration is broken by the fix for JDK-8273297 In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 09:58:38 GMT, Andrew Haley wrote: > The recent AES/GCM acceleration on AArch64 was broken by https://bugs.openjdk.java.net/browse/JDK-8273297 . This was entirely expected, and I approved the patch, but now we must make AArch64 acceleration work again. > The only significant change from the point of view of this patch is that one argument was added to the call to the intrinsic, and that argument caused another argument to spill onto the stack. This pull request has now been integrated. Changeset: c74726db Author: Andrew Haley URL: https://git.openjdk.java.net/jdk/commit/c74726dbd0767d02abf9535361a86ffb69b646d9 Stats: 20 lines in 3 files changed: 11 ins; 1 del; 8 mod 8274730: AArch64: AES/GCM acceleration is broken by the fix for JDK-8273297 Reviewed-by: adinn, roland, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5819 From chagedorn at openjdk.java.net Wed Oct 6 08:25:12 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 6 Oct 2021 08:25:12 GMT Subject: RFR: 8273410: IR verification framework fails with "Should find method name in validIrRulesMap" [v2] In-Reply-To: References: Message-ID: On Mon, 27 Sep 2021 07:41:59 GMT, Christian Hagedorn wrote: >> The IR framework treated a `@Check` method as `@Test` method instead of the `@Test` method itself at IR matching time resulting in an internal framework exception. >> >> While writing some tests for checked test I've noticed that a missing `@Arguments` annotation is not reported as `TestFormatException` but with a `RuntimeException` later when invoking the method in question. I also added the missing check for it. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add @bug Thanks Nils for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5678 From chagedorn at openjdk.java.net Wed Oct 6 08:25:14 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 6 Oct 2021 08:25:14 GMT Subject: Integrated: 8273410: IR verification framework fails with "Should find method name in validIrRulesMap" In-Reply-To: References: Message-ID: On Fri, 24 Sep 2021 10:36:34 GMT, Christian Hagedorn wrote: > The IR framework treated a `@Check` method as `@Test` method instead of the `@Test` method itself at IR matching time resulting in an internal framework exception. > > While writing some tests for checked test I've noticed that a missing `@Arguments` annotation is not reported as `TestFormatException` but with a `RuntimeException` later when invoking the method in question. I also added the missing check for it. > > Thanks, > Christian This pull request has now been integrated. Changeset: df125f68 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/df125f680b6a4517109be80512a113064ca6281d Stats: 233 lines in 3 files changed: 232 ins; 0 del; 1 mod 8273410: IR verification framework fails with "Should find method name in validIrRulesMap" Reviewed-by: thartmann, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/5678 From ihse at openjdk.java.net Wed Oct 6 10:42:10 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Wed, 6 Oct 2021 10:42:10 GMT Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3] In-Reply-To: References: Message-ID: On Wed, 6 Oct 2021 05:17:28 GMT, Jie Fu wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: placeholders and their parameters expect 1 variadic arguments, but 3 were provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4778: 'sscanf' : unterminated format string '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: placeholders and their parameters expect 0 variadic arguments, but 2 were provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature names. >> 2) I don't think there is a use case, in which people will input non-ASCII for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more portable. >> And if we do so, we can also remove the only one `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII I think this was the best possible solution. ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5704 From kvn at openjdk.java.net Wed Oct 6 15:02:09 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 6 Oct 2021 15:02:09 GMT Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3] In-Reply-To: References: Message-ID: On Wed, 6 Oct 2021 05:17:28 GMT, Jie Fu wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: placeholders and their parameters expect 1 variadic arguments, but 3 were provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4778: 'sscanf' : unterminated format string '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: placeholders and their parameters expect 0 variadic arguments, but 2 were provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature names. >> 2) I don't think there is a use case, in which people will input non-ASCII for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more portable. >> And if we do so, we can also remove the only one `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII Looks good to me. Let me test it before approval. ------------- PR: https://git.openjdk.java.net/jdk/pull/5704 From kvn at openjdk.java.net Wed Oct 6 17:46:08 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 6 Oct 2021 17:46:08 GMT Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3] In-Reply-To: References: Message-ID: <8dHIMkQlRDuNctLP7QxzPsMJ7WWmHtWJVZYa4Klr1ck=.39b0f76f-8a0a-4f2f-b0c2-b29aeed18c8d@github.com> On Wed, 6 Oct 2021 05:17:28 GMT, Jie Fu wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: placeholders and their parameters expect 1 variadic arguments, but 3 were provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4778: 'sscanf' : unterminated format string '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: placeholders and their parameters expect 0 variadic arguments, but 2 were provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature names. >> 2) I don't think there is a use case, in which people will input non-ASCII for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more portable. >> And if we do so, we can also remove the only one `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII Passed my tier1-3 testing ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5704 From jiefu at openjdk.java.net Wed Oct 6 23:26:12 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 6 Oct 2021 23:26:12 GMT Subject: Integrated: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern In-Reply-To: References: Message-ID: <4w6EdU1pLWMhdMbjG4HVon7jXmTqel12ED_OfsxCW4E=.86c166d0-eb1c-486b-bf9c-a61aaea2b467@github.com> On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu wrote: > Hi all, > > I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). > However, I failed with C4474 and C4778 warnings as below: > > Compiling 100 properties into resource bundles for java.desktop > Compiling 3038 files for java.base > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error C2220: the following warning is treated as an error > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: placeholders and their parameters expect 1 variadic arguments, but 3 were provided > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4778: 'sscanf' : unterminated format string '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: placeholders and their parameters expect 0 variadic arguments, but 2 were provided > > > The failure is caused by non-ASCII chars in the format string of sscanf [1][2], which is non-portable on our Windows platform. > In fact, these non-ASCII coding also triggers C4819 warning, which had been disabled in JDK-8216154 [3]. > And I also found an article showing that sscanf may fail with non-ASCII in the format string [4]. > > So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). > And I think it's safe to do so. > > This is because: > 1) There are actually no non-ASCII chars for package/class/method/signature names. > 2) I don't think there is a use case, in which people will input non-ASCII for `CompileCommand`. > > You may argue that the non-ASCII may be used by the parser itself. > But I didn't find that usage at all. (Please let me know if I miss something.) > > So I suggest to remove these non-ASCII code to make HotSpot to be more portable. > And if we do so, we can also remove the only one `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. > > Testing: > - Build tests on Windows > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 > [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html > [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ > [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 This pull request has now been integrated. Changeset: c833b4d1 Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/c833b4d130fabfa6a6f3a38313f76eb7e392c6a5 Stats: 23 lines in 1 file changed: 15 ins; 5 del; 3 mod 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern Reviewed-by: iklam, ihse, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5704 From jiefu at openjdk.java.net Wed Oct 6 23:26:12 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 6 Oct 2021 23:26:12 GMT Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 06:38:05 GMT, Ioi Lam wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: placeholders and their parameters expect 1 variadic arguments, but 3 were provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4778: 'sscanf' : unterminated format string '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: placeholders and their parameters expect 0 variadic arguments, but 2 were provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature names. >> 2) I don't think there is a use case, in which people will input non-ASCII for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more portable. >> And if we do so, we can also remove the only one `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > My experiments above with ` -XX:CompileCommand='compileonly,*::??'` was done on Linux. I tried doing the same on Windows. On US-English Windows, the default codepage is 437 (DOS Latin US). If I change it to 65001 (UTF8) then Java is able to output CJK characters to the console. > > > public class CJK { > public static void main(String args[]) { > System.out.println(args[0]); > \u722a\u54c7(); > } > > static void \u722a\u54c7() { // Chinese word for "Java" > Thread.dumpStack(); > } > } > > > > c:\ade>chcp > Active code page: 437 > > c:\ade>jdk-17\bin\java -cp . CJK 123 > 123 > java.lang.Exception: Stack trace > at java.base/java.lang.Thread.dumpStack(Thread.java:1380) > at CJK.??(CJK.java:8) > at CJK.main(CJK.java:4) > > c:\ade>chcp 65001 > Active code page: 65001 > > c:\ade>jdk-17\bin\java -cp . CJK ?? > ?? > java.lang.Exception: Stack trace > at java.base/java.lang.Thread.dumpStack(Thread.java:1380) > at CJK.??(CJK.java:8) > at CJK.main(CJK.java:4) > > > As you can see, the CJK characters in the command-line arguments can't even be correctly passed as arguments to the Java main class. If that doesn't work, I can't see how we can get `-XX:CompileCommand` to work with CJK characters. Thanks @iklam @magicus and @vnkozlov . ------------- PR: https://git.openjdk.java.net/jdk/pull/5704 From mdoerr at openjdk.java.net Thu Oct 7 08:51:09 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 7 Oct 2021 08:51:09 GMT Subject: RFR: 8271202: C1: assert(false) failed: live_in set of first block must be empty In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 06:57:26 GMT, Yi Yang wrote: > Hi, I'm trying to fix [JDK-8271202](https://bugs.openjdk.java.net/browse/JDK-8271202). A local variable(smallinvoc) is defined in B3 and only used in B14, so it oughts to have a short lifetime. But its lifetime has been unconditionally extended since -XX:+DeoptimizeALot(**Just removing this may be also a simpler and safer fix? Not sure if it's acceptable**), making it propagate to almost the whole remaing IR. > > https://github.com/openjdk/jdk/blob/ecd445562f8355704a041f9eca0e87dc85a7f44c/src/hotspot/share/ci/ciMethod.cpp#L373-L379 > > ![image](https://user-images.githubusercontent.com/5010047/127277954-2a64d87e-2981-4d74-8001-c7efeb000a10.png) > > > A virtual register(v603) that represents this variable is located in B13 live_in set, which propagated to B1 live_out set. > > When B1 merges state with B16 and B19, it found that this variable in new_state(B16) was empty, so B1 invalidates the corresponding local slot. > > https://github.com/openjdk/jdk/blob/ecd445562f8355704a041f9eca0e87dc85a7f44c/src/hotspot/share/c1/c1_Instruction.cpp#L826-L838 > > I think we should invalidate this slot only when their types are mismatched. Otherwise, Phi will not be generated, B19 live_gen set will not contain this variable, because of which this variable is alive in B1 live_in. B1 live_in will eventually backward propagate to B20 live_in set, it avoids being killed by B19 live_gen, which causes the crash. > > > Block 1 > live_in: > 603 616 617 618 619 620 621 622 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > 620 > live_kill: > 648 649 650 > > Block 16 > live_in: > 603 616 617 618 619 620 621 622 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > 616 617 618 619 620 621 622 > live_kill: > 620 654 655 656 657 > > Block 19 > live_in: > 603 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > > live_kill: > 0 1 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 > > > Block 20 > live_in: > 603 > live_out: > 603 > live_gen: > > live_kill: > 577 578 Thanks for your investigation. Unfortunately, the proposed fix is not correct: # Internal Error (jdk-dev/src/hotspot/share/c1/c1_GraphBuilder.cpp:2551), pid=10122, tid=10177 # assert(opd != __null) failed: Operand must exist! Found by compiler/c1/ExtendLocalVarLifetime.java on PPC64le. I think the JVM doesn't have a real problem in product build. C1 bails out instead of asserting which is fine. Can we bail out in debug build as well or adapt the assertion? ------------- PR: https://git.openjdk.java.net/jdk/pull/4916 From mdoerr at openjdk.java.net Thu Oct 7 11:02:10 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 7 Oct 2021 11:02:10 GMT Subject: RFR: 8271202: C1: assert(false) failed: live_in set of first block must be empty In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 06:57:26 GMT, Yi Yang wrote: > A local variable(smallinvoc) is defined in B3 and only used in B14 This sounds like a bug. B3 doesn't dominate B14, so B14 can't use the variable. Shouldn't the local slot have been invalidated on the path between the two blocks? Invalidation of local slots usually occurs when the control flow predecessors have different types or a dead value (indicated by `new_value == NULL` in `BlockBegin::try_merge`). Maybe it's a case which C1 can't handle. If so, bailing out would be the right choice IMHO. test/hotspot/jtreg/compiler/c1/ExtendLocalVarLifetime.java line 72: > 70: } > 71: } > 72: } newline missing ------------- PR: https://git.openjdk.java.net/jdk/pull/4916 From simonis at openjdk.java.net Thu Oct 7 12:42:08 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 7 Oct 2021 12:42:08 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v3] In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 15:35:57 GMT, Martin Doerr wrote: >> Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Fix special case where we're creating an implicit exception for a regular invoke* bytecode >> - Minor updates as requested by @TheRealMDoerr >> - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow > > Thanks for the update! New version looks good to me and our tests with disabled OmitStackTraceInFastThrow have passed. I think you should choose a couple of jtreg tests and add an additional run with OmitStackTraceInFastThrow disabled. What do you think? Thanks for running the test @TheRealMDoerr. I've also run tier1/tier2/jck_runtime/jck_compiler on both linux/amd64 and linux/aarch64 with a special build where `OmitStackTraceInFastThrow` was disabled. I haven't seen any problems except for `compiler/loopopts/TestDivZeroCheckControl.java` which timeouts. But that's expected because that test creates `ArithmeticExceptions` in a hot loop. My change makes it run in half the time, but that's still not enough to prevent timeouts :) Anything left before you can formally review the change? ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Oct 7 12:46:12 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 7 Oct 2021 12:46:12 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v3] In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 19:40:55 GMT, Nils Eliasson wrote: > In the benchmark you are testing a case where the exception isn't used - will the allocation be eliminated then? > > In addition to the benchmark you have provided, I would like to see functional tests for both when the allocation is used, and when it isn't. Have you looked at the new IR Test Framework? Good idea! I'll take a look. I've now also understood what @TheRealMDoerr meant by "*choose a couple of jtreg tests and add an additional run with OmitStackTraceInFastThrow disabled*2 :) I'll add both, an explicit test as well as investigate if it makes sense to run some testes with `-XX:-OmitStackTraceInFastThrow` to exercise the new functionality. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From github.com+71546117+tobiasholenstein at openjdk.java.net Thu Oct 7 13:31:27 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Thu, 7 Oct 2021 13:31:27 GMT Subject: RFR: JDK-8251513: Code around Parse::do_lookupswitch/do_tableswitch should be cleaned up Message-ID: <4xMeWOd5dV5wtEPDp1UuLCLJDKcj9seD9_IpiXS8PrA=.f172e00f-7060-40d7-a52a-f94428c27083@github.com> - `default_cnt` can be computed without using a loop: An example of how `defaults` was computed before at parse2.cpp:521-533 with switch labels `-10`, `0`, `10`, `42` and `200`: defaults = 0 defaults += -10 - (-2147483648) defaults += 0 - (-10 + 1) defaults += 10 - (0 + 1) defaults += 42 - (10 + 1) defaults += 200 - (42 + 1) defaults += 2147483647 - (200 + 1) + 1 => `defaults` = -10 - (-2147483648) + 0 - (-10 + 1) + 10 - (0 + 1) + 42 - (10 + 1) + 200 - (42 + 1) + 2147483647 - (200 + 1) + 1 = 4294967291 = 2147483648 + 2147483648 - 5 BUT actually `defaults` was : `defaults` = 2147483648 + 2147483648 The reason has to do with using floats: ((float)match_int - (float)prev) == (-(float)prev) is True for match_int=-10, prev=-2147483648 BUT actually `defaults` (2147483648 + 2147483648 - 5) can also be computed without using a loop with `juint defaults = max_juint - len` - also made some casts explicit - A lot of casts could be avoided by making `_cnt` in `SwitchRange` a uint. Unfortunately, the Range for the default values of a switch in `do_lookupswitch` calculates the count by scaling the average cnt/label up to cnt/range which needs a float to store an accurate result ------------- Commit messages: - JDK-8251513 - JDK-8251513: Code around Parse::do_lookupswitch/do_tableswitch should be cleaned up Changes: https://git.openjdk.java.net/jdk/pull/5837/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5837&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8251513 Stats: 38 lines in 2 files changed: 3 ins; 12 del; 23 mod Patch: https://git.openjdk.java.net/jdk/pull/5837.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5837/head:pull/5837 PR: https://git.openjdk.java.net/jdk/pull/5837 From github.com+71546117+tobiasholenstein at openjdk.java.net Thu Oct 7 13:31:27 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Thu, 7 Oct 2021 13:31:27 GMT Subject: RFR: JDK-8251513: Code around Parse::do_lookupswitch/do_tableswitch should be cleaned up In-Reply-To: <4xMeWOd5dV5wtEPDp1UuLCLJDKcj9seD9_IpiXS8PrA=.f172e00f-7060-40d7-a52a-f94428c27083@github.com> References: <4xMeWOd5dV5wtEPDp1UuLCLJDKcj9seD9_IpiXS8PrA=.f172e00f-7060-40d7-a52a-f94428c27083@github.com> Message-ID: On Wed, 6 Oct 2021 09:27:15 GMT, Tobias Holenstein wrote: > - `default_cnt` can be computed without using a loop: > > An example of how `defaults` was computed before at parse2.cpp:521-533 with switch labels `-10`, `0`, `10`, `42` and `200`: > > defaults = 0 > defaults += -10 - (-2147483648) > defaults += 0 - (-10 + 1) > defaults += 10 - (0 + 1) > defaults += 42 - (10 + 1) > defaults += 200 - (42 + 1) > defaults += 2147483647 - (200 + 1) + 1 > > => `defaults` = > -10 - (-2147483648) + 0 - (-10 + 1) + 10 - (0 + 1) + 42 - (10 + 1) + 200 - (42 + 1) + 2147483647 - (200 + 1) + 1 = > 4294967291 = 2147483648 + 2147483648 - 5 > BUT actually `defaults` was : `defaults` = 2147483648 + 2147483648 > > The reason has to do with using floats: > ((float)match_int - (float)prev) == (-(float)prev) is True > for match_int=-10, prev=-2147483648 > > BUT actually `defaults` (2147483648 + 2147483648 - 5) can also be computed without using a loop with `juint defaults = max_juint - len` > > > - also made some casts explicit > > - A lot of casts could be avoided by making `_cnt` in `SwitchRange` a uint. Unfortunately, the Range for the default values of a switch in `do_lookupswitch` calculates the count by scaling the average cnt/label up to cnt/range which needs a float to store an accurate result in `do_lookupswitch` the ranges for the defaults values have very accurate value (by taking the average). Whereas the ranges for the defaults values in `do_tableswitch` are just calculated by taking `profile->default_count() / 2` for the lower and upper default ranges. But this can be very imbalanced if lower and upper range for the default have very different size. Should this be changed? ------------- PR: https://git.openjdk.java.net/jdk/pull/5837 From chagedorn at openjdk.java.net Thu Oct 7 15:03:24 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 7 Oct 2021 15:03:24 GMT Subject: RFR: 8274785: ciReplay: Potential crash due to uninitialized Compile::_ilt variable Message-ID: While working on JDK-8272912 and inserting `assert(false)` on various places for testing purposes, I noticed the following segmentation fault in one case: The inline tree `Compile::_ilt` variable is not initialized directly by the initializer list but only later in `Compile::Compile()` when calling _ilt = InlineTree::build_inline_tree_root(); Before this assignment, `_ilt` can contain garbage (i.e. `!= NULL`). When hitting an assert or crash before returning from `build_inline_tree_root()`, replay compilation is trying to dump the inline tree and fails to notice that the inline tree is still uninitialized. This can result in a segmentation fault when accessing `_ilt`. Thanks, Christian ------------- Commit messages: - 8274785: ciReplay: Potential crash due to uninitialized Compile::_ilt variable Changes: https://git.openjdk.java.net/jdk/pull/5852/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5852&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274785 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5852.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5852/head:pull/5852 PR: https://git.openjdk.java.net/jdk/pull/5852 From dcubed at openjdk.java.net Thu Oct 7 17:04:30 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 7 Oct 2021 17:04:30 GMT Subject: RFR: 8274920: ProblemList 2 VectorAPI tests failing due to "assert(!vbox->is_Phi()) failed" Message-ID: A trivial fix to ProblemList 2 VectorAPI tests failing due to "assert(!vbox->is_Phi()) failed". ------------- Commit messages: - 8274920: ProblemList 2 VectorAPI tests failing due to "assert(!vbox->is_Phi()) failed" Changes: https://git.openjdk.java.net/jdk/pull/5853/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5853&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274920 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5853.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5853/head:pull/5853 PR: https://git.openjdk.java.net/jdk/pull/5853 From kvn at openjdk.java.net Thu Oct 7 17:12:08 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 7 Oct 2021 17:12:08 GMT Subject: RFR: 8274920: ProblemList 2 VectorAPI tests failing due to "assert(!vbox->is_Phi()) failed" In-Reply-To: References: Message-ID: On Thu, 7 Oct 2021 16:56:47 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList 2 VectorAPI tests failing due to "assert(!vbox->is_Phi()) failed". Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5853 From dcubed at openjdk.java.net Thu Oct 7 17:18:10 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 7 Oct 2021 17:18:10 GMT Subject: RFR: 8274920: ProblemList 2 VectorAPI tests failing due to "assert(!vbox->is_Phi()) failed" In-Reply-To: References: Message-ID: On Thu, 7 Oct 2021 17:09:04 GMT, Vladimir Kozlov wrote: >> A trivial fix to ProblemList 2 VectorAPI tests failing due to "assert(!vbox->is_Phi()) failed". > > Good. @vnkozlov - Thanks for the fast review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5853 From dcubed at openjdk.java.net Thu Oct 7 17:18:10 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 7 Oct 2021 17:18:10 GMT Subject: Integrated: 8274920: ProblemList 2 VectorAPI tests failing due to "assert(!vbox->is_Phi()) failed" In-Reply-To: References: Message-ID: On Thu, 7 Oct 2021 16:56:47 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList 2 VectorAPI tests failing due to "assert(!vbox->is_Phi()) failed". This pull request has now been integrated. Changeset: 920e7070 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/920e70701da9699765c993e11feba3cc0fd0362c Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8274920: ProblemList 2 VectorAPI tests failing due to "assert(!vbox->is_Phi()) failed" Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5853 From neliasso at openjdk.java.net Thu Oct 7 18:23:06 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 7 Oct 2021 18:23:06 GMT Subject: RFR: 8274785: ciReplay: Potential crash due to uninitialized Compile::_ilt variable In-Reply-To: References: Message-ID: On Thu, 7 Oct 2021 14:52:57 GMT, Christian Hagedorn wrote: > While working on JDK-8272912 and inserting `assert(false)` on various places for testing purposes, I noticed the following segmentation fault in one case: > > The inline tree `Compile::_ilt` variable is not initialized directly by the initializer list but only later in `Compile::Compile()` when calling > > _ilt = InlineTree::build_inline_tree_root(); > > Before this assignment, `_ilt` can contain garbage (i.e. `!= NULL`). When hitting an assert or crash before returning from `build_inline_tree_root()`, replay compilation is trying to dump the inline tree and fails to notice that the inline tree is still uninitialized. This can result in a segmentation fault when accessing `_ilt`. > > Thanks, > Christian Good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5852 From kvn at openjdk.java.net Thu Oct 7 18:40:07 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 7 Oct 2021 18:40:07 GMT Subject: RFR: 8274785: ciReplay: Potential crash due to uninitialized Compile::_ilt variable In-Reply-To: References: Message-ID: On Thu, 7 Oct 2021 14:52:57 GMT, Christian Hagedorn wrote: > While working on JDK-8272912 and inserting `assert(false)` on various places for testing purposes, I noticed the following segmentation fault in one case: > > The inline tree `Compile::_ilt` variable is not initialized directly by the initializer list but only later in `Compile::Compile()` when calling > > _ilt = InlineTree::build_inline_tree_root(); > > Before this assignment, `_ilt` can contain garbage (i.e. `!= NULL`). When hitting an assert or crash before returning from `build_inline_tree_root()`, replay compilation is trying to dump the inline tree and fails to notice that the inline tree is still uninitialized. This can result in a segmentation fault when accessing `_ilt`. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5852 From chagedorn at openjdk.java.net Fri Oct 8 06:26:06 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 8 Oct 2021 06:26:06 GMT Subject: RFR: 8274785: ciReplay: Potential crash due to uninitialized Compile::_ilt variable In-Reply-To: References: Message-ID: On Thu, 7 Oct 2021 14:52:57 GMT, Christian Hagedorn wrote: > While working on JDK-8272912 and inserting `assert(false)` on various places for testing purposes, I noticed the following segmentation fault in one case: > > The inline tree `Compile::_ilt` variable is not initialized directly by the initializer list but only later in `Compile::Compile()` when calling > > _ilt = InlineTree::build_inline_tree_root(); > > Before this assignment, `_ilt` can contain garbage (i.e. `!= NULL`). When hitting an assert or crash before returning from `build_inline_tree_root()`, replay compilation is trying to dump the inline tree and fails to notice that the inline tree is still uninitialized. This can result in a segmentation fault when accessing `_ilt`. > > Thanks, > Christian Thanks Nils and Vladimir for your reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5852 From roland at openjdk.java.net Fri Oct 8 14:45:12 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 8 Oct 2021 14:45:12 GMT Subject: RFR: 8274145: C2: Incorrect computation after JDK-8269752 In-Reply-To: References: <5B9h2DzETeX6X6cg-xH0jNgWxbFrzw4Xlfxv01pTzkA=.b19b9e3e-ef57-44f1-976a-42f0e0929b73@github.com> Message-ID: On Tue, 5 Oct 2021 19:47:47 GMT, Nils Eliasson wrote: >> The bug happens because an If node that follows a CountedLoop is >> replaced by the CountedLoopEnd node of the main loop. Further >> unrolling happens after the If is replaced which causes the condition >> of the CountedLoopEnd node to change. This is made possible by >> JDK-8269752. The fix I propose is to detect that corner case and >> prevent the If to be replaced in that case. > > Please change the bug title to something more descriptive. thanks for the reviews @neliasso @vnkozlov ------------- PR: https://git.openjdk.java.net/jdk/pull/5712 From roland at openjdk.java.net Fri Oct 8 14:52:20 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 8 Oct 2021 14:52:20 GMT Subject: Integrated: 8274145: C2: condition incorrectly made redundant with dominating main loop exit condition In-Reply-To: <5B9h2DzETeX6X6cg-xH0jNgWxbFrzw4Xlfxv01pTzkA=.b19b9e3e-ef57-44f1-976a-42f0e0929b73@github.com> References: <5B9h2DzETeX6X6cg-xH0jNgWxbFrzw4Xlfxv01pTzkA=.b19b9e3e-ef57-44f1-976a-42f0e0929b73@github.com> Message-ID: On Mon, 27 Sep 2021 08:57:25 GMT, Roland Westrelin wrote: > The bug happens because an If node that follows a CountedLoop is > replaced by the CountedLoopEnd node of the main loop. Further > unrolling happens after the If is replaced which causes the condition > of the CountedLoopEnd node to change. This is made possible by > JDK-8269752. The fix I propose is to detect that corner case and > prevent the If to be replaced in that case. This pull request has now been integrated. Changeset: 2aacd422 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/2aacd4220a01b467de671212c7a74e6c81a2ad3c Stats: 149 lines in 3 files changed: 148 ins; 0 del; 1 mod 8274145: C2: condition incorrectly made redundant with dominating main loop exit condition Reviewed-by: kvn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/5712 From chagedorn at openjdk.java.net Fri Oct 8 14:54:09 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 8 Oct 2021 14:54:09 GMT Subject: Integrated: 8274785: ciReplay: Potential crash due to uninitialized Compile::_ilt variable In-Reply-To: References: Message-ID: On Thu, 7 Oct 2021 14:52:57 GMT, Christian Hagedorn wrote: > While working on JDK-8272912 and inserting `assert(false)` on various places for testing purposes, I noticed the following segmentation fault in one case: > > The inline tree `Compile::_ilt` variable is not initialized directly by the initializer list but only later in `Compile::Compile()` when calling > > _ilt = InlineTree::build_inline_tree_root(); > > Before this assignment, `_ilt` can contain garbage (i.e. `!= NULL`). When hitting an assert or crash before returning from `build_inline_tree_root()`, replay compilation is trying to dump the inline tree and fails to notice that the inline tree is still uninitialized. This can result in a segmentation fault when accessing `_ilt`. > > Thanks, > Christian This pull request has now been integrated. Changeset: 36b89a18 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/36b89a18931d42b8002a843ec8218b5c1ba54374 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8274785: ciReplay: Potential crash due to uninitialized Compile::_ilt variable Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5852 From eosterlund at openjdk.java.net Fri Oct 8 15:03:06 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 8 Oct 2021 15:03:06 GMT Subject: RFR: 8274773: [TESTBUG] UnsafeIntrinsicsTest intermittently fails on weak memory model platform In-Reply-To: References: Message-ID: <5MvlD693ypnPYigEOGRR2xzHTZ-mYFKZKjPSy0_lcOs=.187455c9-00e5-4abe-8c5d-a746195ca0e2@github.com> On Tue, 5 Oct 2021 12:27:05 GMT, Martin Doerr wrote: > The test creates new Nodes and publishes them to concurrent readers. This requires at least release and load_consume. A clean fix would be to make `Node.next` volatile. But that would be a sledgehammer. A minimalistic fix for our supported weak memory model platforms is to insert a `storeFence`. What is better? > (See JBS for failure description.) Yeah I have the same fix in the generational ZGC repository. Got bitten by the same issue and used the same solution. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5823 From whuang at openjdk.java.net Sat Oct 9 08:22:19 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Sat, 9 Oct 2021 08:22:19 GMT Subject: Integrated: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo In-Reply-To: References: Message-ID: <2xHfCPldLbGsYQFPvnLK89Y_DQ0urt0VfroqrbJcTP0=.bf116946-e40a-4616-8ca6-aefac6493a47@github.com> On Thu, 8 Jul 2021 11:50:36 GMT, Wang Huang wrote: > Dear all, > Can you do me a favor to review this patch. This patch use `ldp` to implement String.compareTo. > > * We add a JMH test case > * Here is the result of this test case > > Benchmark |(size)| Mode| Cnt|Score | Error |Units > ---------------------------------|------|-----|----|------|--------|----- > StringCompare.compareLL | 64 | avgt| 5 |7.992 | ? 0.005|us/op > StringCompare.compareLL | 72 | avgt| 5 |15.029| ? 0.006|us/op > StringCompare.compareLL | 80 | avgt| 5 |14.655| ? 0.011|us/op > StringCompare.compareLL | 91 | avgt| 5 |16.363| ? 0.12 |us/op > StringCompare.compareLL | 101 | avgt| 5 |16.966| ? 0.007|us/op > StringCompare.compareLL | 121 | avgt| 5 |19.276| ? 0.006|us/op > StringCompare.compareLL | 181 | avgt| 5 |19.002| ? 0.417|us/op > StringCompare.compareLL | 256 | avgt| 5 |24.707| ? 0.041|us/op > StringCompare.compareLLWithLdp| 64 | avgt| 5 |8.001 | ? 0.121|us/op > StringCompare.compareLLWithLdp| 72 | avgt| 5 |11.573| ? 0.003|us/op > StringCompare.compareLLWithLdp| 80 | avgt| 5 |6.861 | ? 0.004|us/op > StringCompare.compareLLWithLdp| 91 | avgt| 5 |12.774| ? 0.201|us/op > StringCompare.compareLLWithLdp| 101 | avgt| 5 |8.691 | ? 0.004|us/op > StringCompare.compareLLWithLdp| 121 | avgt| 5 |11.091| ? 1.342|us/op > StringCompare.compareLLWithLdp| 181 | avgt| 5 |14.64 | ? 0.581|us/op > StringCompare.compareLLWithLdp| 256 | avgt| 5 |25.879| ? 1.775|us/op > StringCompare.compareUU | 64 | avgt| 5 |13.476| ? 0.01 |us/op > StringCompare.compareUU | 72 | avgt| 5 |15.078| ? 0.006|us/op > StringCompare.compareUU | 80 | avgt| 5 |23.512| ? 0.011|us/op > StringCompare.compareUU | 91 | avgt| 5 |24.284| ? 0.008|us/op > StringCompare.compareUU | 101 | avgt| 5 |20.707| ? 0.017|us/op > StringCompare.compareUU | 121 | avgt| 5 |29.302| ? 0.011|us/op > StringCompare.compareUU | 181 | avgt| 5 |39.31 | ? 0.016|us/op > StringCompare.compareUU | 256 | avgt| 5 |54.592| ? 0.392|us/op > StringCompare.compareUUWithLdp| 64 | avgt| 5 |16.389| ? 0.008|us/op > StringCompare.compareUUWithLdp| 72 | avgt| 5 |10.71 | ? 0.158|us/op > StringCompare.compareUUWithLdp| 80 | avgt| 5 |11.488| ? 0.024|us/op > StringCompare.compareUUWithLdp| 91 | avgt| 5 |13.412| ? 0.006|us/op > StringCompare.compareUUWithLdp| 101 | avgt| 5 |16.245| ? 0.434|us/op > StringCompare.compareUUWithLdp| 121 | avgt| 5 |16.597| ? 0.016|us/op > StringCompare.compareUUWithLdp| 181 | avgt| 5 |27.373| ? 0.017|us/op > StringCompare.compareUUWithLdp| 256 | avgt| 5 |41.74 | ? 3.5 |us/op > > From this table, we can see that in most cases, our patch is better than old one. > > Thank you for your review. Any suggestions are welcome. This pull request has now been integrated. Changeset: 6d1d4d52 Author: Wang Huang Committer: Hamlin Li URL: https://git.openjdk.java.net/jdk/commit/6d1d4d52928ed38bbc73ddcbede5389995a8e65f Stats: 106 lines in 1 file changed: 39 ins; 40 del; 27 mod 8268231: Aarch64: Use ldp in intrinsics for String.compareTo Co-authored-by: Wang Huang Co-authored-by: Sun Jianye Co-authored-by: Wu Yan Reviewed-by: ngasson, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/4722 From goetz at openjdk.java.net Sat Oct 9 12:05:12 2021 From: goetz at openjdk.java.net (Goetz Lindenmaier) Date: Sat, 9 Oct 2021 12:05:12 GMT Subject: RFR: 8274773: [TESTBUG] UnsafeIntrinsicsTest intermittently fails on weak memory model platform In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 12:27:05 GMT, Martin Doerr wrote: > The test creates new Nodes and publishes them to concurrent readers. This requires at least release and load_consume. A clean fix would be to make `Node.next` volatile. But that would be a sledgehammer. A minimalistic fix for our supported weak memory model platforms is to insert a `storeFence`. What is better? > (See JBS for failure description.) LGTM ------------- Marked as reviewed by goetz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5823 From ngasson at openjdk.java.net Mon Oct 11 06:35:06 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 11 Oct 2021 06:35:06 GMT Subject: RFR: 8272968: AArch64: Remove redundant matching rules for commutative ops In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 03:29:44 GMT, Fei Gao wrote: > Match rules for commutative operations mnegI/mnegL/smnegL might > become redundant after function matchrule_clone_and_swap(), > and hence can be reduced. > > In adlc part, while parsing the contents of an instruction > definition, function instr_parse always do the check > for commutative operations with subtree operands, create > clones and swap operands by function matchrule_clone_and_swap. > It means that another operand-swapped and partially > symmetrical match rule should be generated automatically for > these commutative operations. > > The pattern to construct mnegI, mnegL or smnegL consists of > a subtraction with zero and then a multiplication. In function > count_commutative_op, both mulI and mulL are recognized as > commutative opcodes. Therefore, we need only one match rule > to specify that a multipilication consists of a number and > a subtraction with zero for these three instructions and the > extra one can be deleted. > > Take mnegL as an example. > > Without my patch, four match rules will be created finally for > instruction selection. > > Two of them are created by ad files: > > Match Rule 1: > dst = MulL (SubL zero src1) src2 > ===> > dst = mnegl src1 src2 > > Match Rule 2: > dst = MulL src1 (SubL zero src2) > ===> > dst = mnegl src1 src2 > > The other two are automatically generated by function > matchrule_clone_and_swap based on the two rules above: > > Match Rule 3 (generated by match rule 1): > dst = MulL src2 (SubL zero src1) > ===> > dst = mnegl src1 src2 > > Match Rule 4 (generated by match rule 2): > dst = MulL (SubL zero src2) src1 > ===> > dst = mnegl src1 src2 > > As mnegl is commutative, Rule 3 is equivalent to > Rule 2, and Rule 1 is equivalent to Rule 4. Also, if we only > keep the original Match Rule 1, as showed above, Rule 3 will > be generated automatically later. In this way, Rule 2 and Rule 4 > are redundant and hence Rule 2 can be eliminated. > > With my patch, Rule 2 is removed and Rule 4 won't be generated as well. > Only Rule 1 and 3 are kept in the final rule chain. In my local release > build, as redundant code got removed, the size of libjvm.so decreased > from 23.30 MB to 23.27 MB, with a reduction of 33.11 KB(around 0.14%). Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5646 From github.com+39403138+fg1417 at openjdk.java.net Mon Oct 11 07:03:13 2021 From: github.com+39403138+fg1417 at openjdk.java.net (Fei Gao) Date: Mon, 11 Oct 2021 07:03:13 GMT Subject: Integrated: 8272968: AArch64: Remove redundant matching rules for commutative ops In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 03:29:44 GMT, Fei Gao wrote: > Match rules for commutative operations mnegI/mnegL/smnegL might > become redundant after function matchrule_clone_and_swap(), > and hence can be reduced. > > In adlc part, while parsing the contents of an instruction > definition, function instr_parse always do the check > for commutative operations with subtree operands, create > clones and swap operands by function matchrule_clone_and_swap. > It means that another operand-swapped and partially > symmetrical match rule should be generated automatically for > these commutative operations. > > The pattern to construct mnegI, mnegL or smnegL consists of > a subtraction with zero and then a multiplication. In function > count_commutative_op, both mulI and mulL are recognized as > commutative opcodes. Therefore, we need only one match rule > to specify that a multipilication consists of a number and > a subtraction with zero for these three instructions and the > extra one can be deleted. > > Take mnegL as an example. > > Without my patch, four match rules will be created finally for > instruction selection. > > Two of them are created by ad files: > > Match Rule 1: > dst = MulL (SubL zero src1) src2 > ===> > dst = mnegl src1 src2 > > Match Rule 2: > dst = MulL src1 (SubL zero src2) > ===> > dst = mnegl src1 src2 > > The other two are automatically generated by function > matchrule_clone_and_swap based on the two rules above: > > Match Rule 3 (generated by match rule 1): > dst = MulL src2 (SubL zero src1) > ===> > dst = mnegl src1 src2 > > Match Rule 4 (generated by match rule 2): > dst = MulL (SubL zero src2) src1 > ===> > dst = mnegl src1 src2 > > As mnegl is commutative, Rule 3 is equivalent to > Rule 2, and Rule 1 is equivalent to Rule 4. Also, if we only > keep the original Match Rule 1, as showed above, Rule 3 will > be generated automatically later. In this way, Rule 2 and Rule 4 > are redundant and hence Rule 2 can be eliminated. > > With my patch, Rule 2 is removed and Rule 4 won't be generated as well. > Only Rule 1 and 3 are kept in the final rule chain. In my local release > build, as redundant code got removed, the size of libjvm.so decreased > from 23.30 MB to 23.27 MB, with a reduction of 33.11 KB(around 0.14%). This pull request has now been integrated. Changeset: c032186b Author: Fei Gao Committer: Pengfei Li URL: https://git.openjdk.java.net/jdk/commit/c032186b421c64b44397cb7aa101b40e5f93dfff Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod 8272968: AArch64: Remove redundant matching rules for commutative ops Reviewed-by: ngasson ------------- PR: https://git.openjdk.java.net/jdk/pull/5646 From enikitin at openjdk.java.net Mon Oct 11 10:03:37 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Mon, 11 Oct 2021 10:03:37 GMT Subject: RFR: 8274982: Add a test for 8269574. Message-ID: This PR contains a relatively simple test which verifies that JVMTI-agents are correctly informed about exceptions caught in C2-compiled code. The 8269574 introduces pre-allocated exceptions in some paths, so the test tries to produce a number of various exceptions and check that provided small JVMTI agent got notified about all of them. ------------- Commit messages: - 8274982: Add a test for 8269574. Changes: https://git.openjdk.java.net/jdk/pull/5889/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5889&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274982 Stats: 220 lines in 2 files changed: 220 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5889.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5889/head:pull/5889 PR: https://git.openjdk.java.net/jdk/pull/5889 From mdoerr at openjdk.java.net Mon Oct 11 10:35:19 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 11 Oct 2021 10:35:19 GMT Subject: Integrated: 8274773: [TESTBUG] UnsafeIntrinsicsTest intermittently fails on weak memory model platform In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 12:27:05 GMT, Martin Doerr wrote: > The test creates new Nodes and publishes them to concurrent readers. This requires at least release and load_consume. A clean fix would be to make `Node.next` volatile. But that would be a sledgehammer. A minimalistic fix for our supported weak memory model platforms is to insert a `storeFence`. What is better? > (See JBS for failure description.) This pull request has now been integrated. Changeset: 49f8ce6e Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/49f8ce6e9c797cd11ea586e3cf87398888bc8cf1 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8274773: [TESTBUG] UnsafeIntrinsicsTest intermittently fails on weak memory model platform Reviewed-by: eosterlund, goetz ------------- PR: https://git.openjdk.java.net/jdk/pull/5823 From mdoerr at openjdk.java.net Mon Oct 11 10:35:18 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 11 Oct 2021 10:35:18 GMT Subject: RFR: 8274773: [TESTBUG] UnsafeIntrinsicsTest intermittently fails on weak memory model platform In-Reply-To: References: Message-ID: On Tue, 5 Oct 2021 12:27:05 GMT, Martin Doerr wrote: > The test creates new Nodes and publishes them to concurrent readers. This requires at least release and load_consume. A clean fix would be to make `Node.next` volatile. But that would be a sledgehammer. A minimalistic fix for our supported weak memory model platforms is to insert a `storeFence`. What is better? > (See JBS for failure description.) Thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5823 From chagedorn at openjdk.java.net Mon Oct 11 12:58:21 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 11 Oct 2021 12:58:21 GMT Subject: RFR: 8274911: testlibrary_tests/ir_framework/tests/TestIRMatching.java fails with "java.lang.RuntimeException: Should have thrown exception" Message-ID: The bailout fix added by [JDK-8271471](https://bugs.openjdk.java.net/browse/JDK-8271471) does not correctly work for the internal framework tests. The matching of `` in `Utils.java` was done on the test VM output instead of the the hotspot_pid file. I fixed that by checking the driver VM message sent on a bailout which is easier. I also improved the error reporting of `TestIRMatching` to reduce noise and better format errors. Thanks, Christian ------------- Commit messages: - 8274911: testlibrary_tests/ir_framework/tests/TestIRMatching.java fails with "java.lang.RuntimeException: Should have thrown exception". Changes: https://git.openjdk.java.net/jdk/pull/5893/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5893&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274911 Stats: 128 lines in 4 files changed: 55 ins; 38 del; 35 mod Patch: https://git.openjdk.java.net/jdk/pull/5893.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5893/head:pull/5893 PR: https://git.openjdk.java.net/jdk/pull/5893 From kvn at openjdk.java.net Mon Oct 11 16:11:06 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 11 Oct 2021 16:11:06 GMT Subject: RFR: 8274911: testlibrary_tests/ir_framework/tests/TestIRMatching.java fails with "java.lang.RuntimeException: Should have thrown exception" In-Reply-To: References: Message-ID: On Mon, 11 Oct 2021 12:47:30 GMT, Christian Hagedorn wrote: > The bailout fix added by [JDK-8271471](https://bugs.openjdk.java.net/browse/JDK-8271471) does not correctly work for the internal framework tests. The matching of `` in `Utils.java` was done on the test VM output instead of the the hotspot_pid file. I fixed that by checking the driver VM message sent on a bailout which is easier. I also improved the error reporting of `TestIRMatching` to reduce noise and better format errors. > > Thanks, > Christian Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5893 From duke at openjdk.java.net Tue Oct 12 02:56:56 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Tue, 12 Oct 2021 02:56:56 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled Message-ID: Hi all, Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is:

One or more @IR rules failed:

Failed IR Rules (1)
------------------
- Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
  * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
    - failOn: Graph contains forbidden nodes:
        Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
        Matched forbidden node:
          280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
    - counts: Graph contains wrong number of nodes:
        Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
        Expected 1 but found 0 nodes.

>>> Check stdout for compilation output of the failed methods
This is a patch to fix this problem. Please help review it. Thanks, Sun Guoyun ------------- Commit messages: - 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled Changes: https://git.openjdk.java.net/jdk/pull/5903/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5903&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275086 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5903.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5903/head:pull/5903 PR: https://git.openjdk.java.net/jdk/pull/5903 From thartmann at openjdk.java.net Tue Oct 12 06:13:52 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 12 Oct 2021 06:13:52 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: <7YDLwJShDer1buEI0yxRzZcvv3trB7w1reHrMAoZZWM=.b0c2c836-33c8-4cb2-a1d9-25a2f51f372e@github.com> On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun Since this is a test for a C2 specific optimization, I would suggest to exclude the entire test if C2 is not available via a `@requires vm.compiler2.enabled`. ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5903 From thartmann at openjdk.java.net Tue Oct 12 06:18:52 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 12 Oct 2021 06:18:52 GMT Subject: RFR: 8274911: testlibrary_tests/ir_framework/tests/TestIRMatching.java fails with "java.lang.RuntimeException: Should have thrown exception" In-Reply-To: References: Message-ID: On Mon, 11 Oct 2021 12:47:30 GMT, Christian Hagedorn wrote: > The bailout fix added by [JDK-8271471](https://bugs.openjdk.java.net/browse/JDK-8271471) does not correctly work for the internal framework tests. The matching of `` in `Utils.java` was done on the test VM output instead of the the hotspot_pid file. I fixed that by checking the driver VM message sent on a bailout which is easier. I also improved the error reporting of `TestIRMatching` to reduce noise and better format errors. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5893 From whuang at openjdk.java.net Tue Oct 12 06:32:21 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Tue, 12 Oct 2021 06:32:21 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v8] In-Reply-To: References: Message-ID: > * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. > * It may be a solver of JDK-8269866, or part of it. Wang Huang has updated the pull request incrementally with one additional commit since the last revision: remove useless codes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4839/files - new: https://git.openjdk.java.net/jdk/pull/4839/files/c8a0134a..e8e6f014 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=06-07 Stats: 17 lines in 1 file changed: 0 ins; 17 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4839.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4839/head:pull/4839 PR: https://git.openjdk.java.net/jdk/pull/4839 From wuyan at openjdk.java.net Tue Oct 12 06:35:53 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Tue, 12 Oct 2021 06:35:53 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v5] In-Reply-To: <_Ht7VdOoJ8SVaD0F8IQkeD9sEmndprltgq5KoIwIi24=.31f26ea5-b848-4ddc-8437-77ce156b8042@github.com> References: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> <_Ht7VdOoJ8SVaD0F8IQkeD9sEmndprltgq5KoIwIi24=.31f26ea5-b848-4ddc-8437-77ce156b8042@github.com> Message-ID: On Wed, 29 Sep 2021 03:40:21 GMT, Eric Liu wrote: >> You are right, I'll fix it. > > Kindly remind that those workaround code should be removed after the latest merge. Thanks, removed it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From serb at openjdk.java.net Tue Oct 12 07:10:48 2021 From: serb at openjdk.java.net (Sergey Bylokhov) Date: Tue, 12 Oct 2021 07:10:48 GMT Subject: RFR: 8268764: Use Long.hashCode() instead of int-cast where applicable [v4] In-Reply-To: References: Message-ID: On Thu, 1 Jul 2021 12:19:53 GMT, ?????? ??????? wrote: >> In some JDK classes there's still the following hashCode() implementation: >> >> long objNum; >> >> public int hashCode() { >> return (int) objNum; >> } >> >> This outdated expression should be replaced with Long.hashCode(long) as it >> >> - uses all bits of the original value, does not discard any information upfront. For example, depending on how you are generating the IDs, the upper bits could change more frequently (or the opposite). >> >> - does not introduce any bias towards values with more ones (zeros), as it would be the case if the two halves were combined with an OR (AND) operation. >> >> See https://stackoverflow.com/a/4045083 >> >> This is related to https://github.com/openjdk/jdk/pull/4309 > > ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: > > 8268764: Update copy-right year Marked as reviewed by serb (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4491 From duke at openjdk.java.net Tue Oct 12 07:13:56 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Tue, 12 Oct 2021 07:13:56 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun In fact, I found this problem on the MIPS architecture. At present, we support C2 but not C1 on MIPS architecture. And, if I change it to @requires vm.compiler1.enabled, the test still fails, and the jtreg prompt message is: #Test results: no tests selected ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From thartmann at openjdk.java.net Tue Oct 12 07:22:53 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 12 Oct 2021 07:22:53 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: <5ex81v2cwnbytTM8PCme_AUVMimUZSkNqwba52RSU74=.5c5f761b-960a-439d-aa73-14312f6876ab@github.com> On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun Sorry, I misread your original message and thought the test fails if C2 is **not** available. So the issue is that the test fails if C1 is not available. I would assume this is due to missing profile information. Does increasing the number of warmup iterations via `-DWarmup=10000` help? ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From duke at openjdk.java.net Tue Oct 12 07:25:49 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Tue, 12 Oct 2021 07:25:49 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun I found the compilation info with c1 is: ` 937 b 3 compiler.c2.irTests.TestPostParseCallDevirtualization::testMethodHandleCallWithCCP (50 bytes) 938 b 1 compiler.c2.irTests.TestPostParseCallDevirtualization::method2 (3 bytes) 939 b 3 java.lang.reflect.Method::getParameterCount (6 bytes) 940 b 3 java.lang.reflect.Method::invoke (65 bytes) 941 b 3 jdk.internal.reflect.DelegatingMethodAccessorImpl::invoke (10 bytes) 942 !b 3 compiler.lib.ir_framework.test.CustomRunTest::invokeTest (76 bytes) 943 b 3 jdk.test.lib.Asserts::assertEquals (7 bytes) 944 !b 3 jdk.internal.reflect.GeneratedMethodAccessor1::invoke (62 bytes) 945 b 3 java.lang.invoke.LambdaForm$DMH/0x00000008000e3c00::invokeStatic (20 bytes) 946 b 3 jdk.test.lib.Asserts::assertEquals (42 bytes) 947 b 4 compiler.c2.irTests.TestPostParseCallDevirtualization::testMethodHandleCallWithCCP (50 bytes) ` here compile id 945 be called. but the info without c1 is: ` 254 n java.lang.invoke.MethodHandle::invokeBasic()I (native) 255 b java.lang.invoke.MethodHandleImpl::isCompileConstant (2 bytes) 256 b compiler.c2.irTests.TestPostParseCallDevirtualization::method2 (3 bytes) 257 b compiler.c2.irTests.TestPostParseCallDevirtualization::testMethodHandleCallWithCCP (50 bytes) ` the compile id 254 be called ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From duke at openjdk.java.net Tue Oct 12 07:37:51 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Tue, 12 Oct 2021 07:37:51 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun In TestPostParseCallDevirtualization.java, use lable "applyIf=..." to disable testMethodHandleCallWithLoop and testMethodHandleCallWithCCP , other function such as testDynamicCallWithCCP testDynamicCallWithLoop can be tested normally. ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From chagedorn at openjdk.java.net Tue Oct 12 07:43:51 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 12 Oct 2021 07:43:51 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun Though this issue is about excluding C1, I think the IR framework generally does not handle the case if C2 is excluded in the build (i.e. client VM). It only bails out of IR matching if C2 is excluded by command line flags. I will file a bug for it. ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From thartmann at openjdk.java.net Tue Oct 12 07:55:55 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 12 Oct 2021 07:55:55 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 07:22:44 GMT, SUN Guoyun wrote: > here compile id 945 be called. What do you mean by that? IR verification fails because post-parse call devirtualization was not able to replace the `invokeBasic` by an `invokeStatic` in the C2 compiled `testMethodHandleCallWithCCP` method. In theory, that should not be dependent on the availability of C1. So I'm wondering why it fails without C1. Did you check if increasing the number of warmup iterations helps? ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From duke at openjdk.java.net Tue Oct 12 08:15:57 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Tue, 12 Oct 2021 08:15:57 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 07:53:17 GMT, Tobias Hartmann wrote: >> I found the compilation info with c1 is: >>

>>  937    b  3       compiler.c2.irTests.TestPostParseCallDevirtualization::testMethodHandleCallWithCCP (50 bytes)
>>  938    b  1       compiler.c2.irTests.TestPostParseCallDevirtualization::method2 (3 bytes)
>>  939    b  3       java.lang.reflect.Method::getParameterCount (6 bytes)
>>  940    b  3       java.lang.reflect.Method::invoke (65 bytes)          
>>  941    b  3       jdk.internal.reflect.DelegatingMethodAccessorImpl::invoke (10 bytes)
>>  942   !b  3       compiler.lib.ir_framework.test.CustomRunTest::invokeTest (76 bytes)
>>  943    b  3       jdk.test.lib.Asserts::assertEquals (7 bytes)         
>>  944   !b  3       jdk.internal.reflect.GeneratedMethodAccessor1::invoke (62 bytes)
>>  945    b  3       java.lang.invoke.LambdaForm$DMH/0x00000008000e3c00::invokeStatic (20 bytes)
>>  946    b  3       jdk.test.lib.Asserts::assertEquals (42 bytes)        
>>  947    b  4       compiler.c2.irTests.TestPostParseCallDevirtualization::testMethodHandleCallWithCCP (50 bytes)
>> 
>> here compile id 945 be called. >> but the info without c1 is: >> >>

>> 254     n       java.lang.invoke.MethodHandle::invokeBasic()I (native)   
>> 255    b        java.lang.invoke.MethodHandleImpl::isCompileConstant (2 bytes)
>> 256    b        compiler.c2.irTests.TestPostParseCallDevirtualization::method2 (3 bytes)
>> 257    b        compiler.c2.irTests.TestPostParseCallDevirtualization::testMethodHandleCallWithCCP (50 bytes)
>> 
>> the compile id 254 be called > >> here compile id 945 be called. > > What do you mean by that? > > IR verification fails because post-parse call devirtualization was not able to replace the `invokeBasic` by an `invokeStatic` in the C2 compiled `testMethodHandleCallWithCCP` method. In theory, that should not be dependent on the availability of C1. So I'm wondering why it fails without C1. Did you check if increasing the number of warmup iterations helps? @TobiHartmann I don't quite understand the ?post-parse call devirtualization" now. so I just find some info from -XX+PrintIdeal and -XX:+PrintComplation. I have try again with -DWarmup=10000, and I set it in fileTestVMProcess.java, I`m not sure it is ok. the result is also FAILD.

Command Line:

-------------

PR: https://git.openjdk.java.net/jdk/pull/5903

From duke at openjdk.java.net  Tue Oct 12 08:36:49 2021
From: duke at openjdk.java.net (SUN Guoyun)
Date: Tue, 12 Oct 2021 08:36:49 GMT
Subject: RFR: 8275086:
 compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when
 compiler1 is disabled
In-Reply-To: 
References: 
Message-ID: <0c4Q4J2s_6bpg7kt1jADkwdQaDD-dN6pGR5aBVjsdvA=.c694bc6e-b06b-4c2b-aa5e-7fe96cb6d2d3@github.com>

On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun  wrote:

> Hi all,
> Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is:
> 
> 

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun I set -DWarmup=20000, test PASSED ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From thartmann at openjdk.java.net Tue Oct 12 08:36:49 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 12 Oct 2021 08:36:49 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun The test also fails with `-XX:-TieredCompilation` and increasing the warmup does not seem to help. We need to investigate why post-parse call devirtualization does not work in that case. ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From chagedorn at openjdk.java.net Tue Oct 12 09:59:50 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 12 Oct 2021 09:59:50 GMT Subject: RFR: 8274911: testlibrary_tests/ir_framework/tests/TestIRMatching.java fails with "java.lang.RuntimeException: Should have thrown exception" In-Reply-To: References: Message-ID: On Mon, 11 Oct 2021 12:47:30 GMT, Christian Hagedorn wrote: > The bailout fix added by [JDK-8271471](https://bugs.openjdk.java.net/browse/JDK-8271471) does not correctly work for the internal framework tests. The matching of `` in `Utils.java` was done on the test VM output instead of the the hotspot_pid file. I fixed that by checking the driver VM message sent on a bailout which is easier. I also improved the error reporting of `TestIRMatching` to reduce noise and better format errors. > > Thanks, > Christian Thanks Vladimir and Tobias for your reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5893 From ihse at openjdk.java.net Tue Oct 12 11:33:12 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Tue, 12 Oct 2021 11:33:12 GMT Subject: RFR: 8275128: Build hsdis using normal build system Message-ID: There are multiple bugs related to hsdis, calling both for added simplicity in building, and allowing for multiple backends. The very first step is getting rid of the stand-alone Makefile and integrate the build using standard build-infra tooling. This patch does this, and it also contains OOTB building on Windows (as requested in JDK-8208495, and furthermore it lays the foundation for adding more backends to hsdis. ------------- Commit messages: - Fix merge error - Update README - Add dllcrt2.o to LIBS. Now it works on Windows. - Fix -shared for mingw gcc, and workaround problem by modifying OPENJDK_TARGET_OS - Enable macos-aarch64 and better safety check when building - Remove hsdis-demo.c - Make a "fake" mingw-gcc toolchain on Windows - Now binutils configure passes on Windows - Fix sysroot for windows - Fix mingw compiler detection on windows - ... and 5 more: https://git.openjdk.java.net/jdk/compare/722d639f...48e11dd8 Changes: https://git.openjdk.java.net/jdk/pull/5908/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5908&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275128 Stats: 665 lines in 8 files changed: 299 ins; 344 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/5908.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5908/head:pull/5908 PR: https://git.openjdk.java.net/jdk/pull/5908 From erikj at openjdk.java.net Tue Oct 12 12:53:56 2021 From: erikj at openjdk.java.net (Erik Joelsson) Date: Tue, 12 Oct 2021 12:53:56 GMT Subject: RFR: 8275128: Build hsdis using normal build system In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 11:24:01 GMT, Magnus Ihse Bursie wrote: > There are multiple bugs related to hsdis, calling both for added simplicity in building, and allowing for multiple backends. > > The very first step is getting rid of the stand-alone Makefile and integrate the build using standard build-infra tooling. > > This patch does this, and it also contains OOTB building on Windows (as requested in JDK-8208495, and furthermore it lays the foundation for adding more backends to hsdis. Nice to see this fixed! make/Hsdis.gmk line 70: > 68: HSDIS_TOOLCHAIN_CFLAGS := > 69: HSDIS_TOOLCHAIN_LDFLAGS := -L/usr/lib/gcc/$(MINGW_BASE)/9.2.0 -L/usr/$(MINGW_BASE)/sys-root/mingw/lib > 70: HSDIS_TOOLCHAIN_LIBS := /usr/$(MINGW_BASE)/sys-root/mingw/lib/dllcrt2.o -lmingw32 -lgcc -lgcc_eh -lmoldname -lmingwex -lmsvcrt -lpthread -ladvapi32 -lshell32 -luser32 -lkernel32 Maybe break up this line a bit? make/autoconf/jdk-options.m4 line 803: > 801: if test "x$with_hsdis" = xyes; then > 802: AC_MSG_ERROR([--with-hsdis must have a value]) > 803: elif test "x$with_hsdis" = xnone || test "x$with_hsdis" = x; then Should we accept "no" as value too so we can use --without-hsdis? ------------- PR: https://git.openjdk.java.net/jdk/pull/5908 From chagedorn at openjdk.java.net Tue Oct 12 13:24:57 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 12 Oct 2021 13:24:57 GMT Subject: Integrated: 8274911: testlibrary_tests/ir_framework/tests/TestIRMatching.java fails with "java.lang.RuntimeException: Should have thrown exception" In-Reply-To: References: Message-ID: On Mon, 11 Oct 2021 12:47:30 GMT, Christian Hagedorn wrote: > The bailout fix added by [JDK-8271471](https://bugs.openjdk.java.net/browse/JDK-8271471) does not correctly work for the internal framework tests. The matching of `` in `Utils.java` was done on the test VM output instead of the the hotspot_pid file. I fixed that by checking the driver VM message sent on a bailout which is easier. I also improved the error reporting of `TestIRMatching` to reduce noise and better format errors. > > Thanks, > Christian This pull request has now been integrated. Changeset: f6234606 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/f62346066869b681d1cc9f63775393b11a48722a Stats: 128 lines in 4 files changed: 55 ins; 38 del; 35 mod 8274911: testlibrary_tests/ir_framework/tests/TestIRMatching.java fails with "java.lang.RuntimeException: Should have thrown exception" Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5893 From duke at openjdk.java.net Tue Oct 12 19:16:52 2021 From: duke at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Tue, 12 Oct 2021 19:16:52 GMT Subject: Integrated: 8268764: Use Long.hashCode() instead of int-cast where applicable In-Reply-To: References: Message-ID: On Tue, 15 Jun 2021 12:15:11 GMT, ?????? ??????? wrote: > In some JDK classes there's still the following hashCode() implementation: > > long objNum; > > public int hashCode() { > return (int) objNum; > } > > This outdated expression should be replaced with Long.hashCode(long) as it > > - uses all bits of the original value, does not discard any information upfront. For example, depending on how you are generating the IDs, the upper bits could change more frequently (or the opposite). > > - does not introduce any bias towards values with more ones (zeros), as it would be the case if the two halves were combined with an OR (AND) operation. > > See https://stackoverflow.com/a/4045083 > > This is related to https://github.com/openjdk/jdk/pull/4309 This pull request has now been integrated. Changeset: 124f8237 Author: Sergey Tsypanov Committer: Sergey Bylokhov URL: https://git.openjdk.java.net/jdk/commit/124f82377ba93359bc59118ee315ba194080fa92 Stats: 21 lines in 9 files changed: 6 ins; 0 del; 15 mod 8268764: Use Long.hashCode() instead of int-cast where applicable Reviewed-by: kevinw, prr, kizune, serb ------------- PR: https://git.openjdk.java.net/jdk/pull/4491 From ihse at openjdk.java.net Tue Oct 12 21:20:53 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Tue, 12 Oct 2021 21:20:53 GMT Subject: RFR: 8275128: Build hsdis using normal build system [v2] In-Reply-To: References: Message-ID: <1c4S9rMHgwt6r3gZhOX1Fg1XOg4pBnyO_BVdPo7dnAk=.dd5890fa-f989-405b-9e78-dffa1921ef37@github.com> > There are multiple bugs related to hsdis, calling both for added simplicity in building, and allowing for multiple backends. > > The very first step is getting rid of the stand-alone Makefile and integrate the build using standard build-infra tooling. > > This patch does this, and it also contains OOTB building on Windows (as requested in JDK-8208495, and furthermore it lays the foundation for adding more backends to hsdis. Magnus Ihse Bursie has updated the pull request incrementally with 46 additional commits since the last revision: - 8274986: max code printed in hs-err logs should be configurable Reviewed-by: never, dholmes - 8274615: Support relaxed atomic add for linux-aarch64 Reviewed-by: aph, dholmes - 8275002: Remove unused AbstractStringBuilder.MAX_ARRAY_SIZE Reviewed-by: prappo, jlaskey, martin - 8177814: jdk/editpad is not in jdk TEST.groups Reviewed-by: serb - 8275031: runtime/ErrorHandling/MachCodeFramesInErrorFile.java fails when hsdis is present Reviewed-by: dholmes, dnsimon - 8274560: JFR: Add test for OldObjectSample event when using Shenandoah Reviewed-by: mgronlun - 8274466: G1: use field directly rather than method in G1CollectorState::in_mixed_phase Reviewed-by: ayang, sjohanss - 8272167: AbsPathsInImage.java should skip *.dSYM directories Reviewed-by: ihse, erikj - 8274945: Cleanup unnecessary calls to Throwable.initCause() in java.desktop Reviewed-by: jdv, serb, pbansal - 8274925: Shenandoah: shenandoah/TestAllocHumongousFragment.java test failed on lock rank check Reviewed-by: shade - ... and 36 more: https://git.openjdk.java.net/jdk/compare/48e11dd8...2bbeb63d ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5908/files - new: https://git.openjdk.java.net/jdk/pull/5908/files/48e11dd8..2bbeb63d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5908&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5908&range=00-01 Stats: 3373 lines in 189 files changed: 1694 ins; 879 del; 800 mod Patch: https://git.openjdk.java.net/jdk/pull/5908.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5908/head:pull/5908 PR: https://git.openjdk.java.net/jdk/pull/5908 From ihse at openjdk.java.net Tue Oct 12 21:58:23 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Tue, 12 Oct 2021 21:58:23 GMT Subject: RFR: 8275128: Build hsdis using normal build system [v3] In-Reply-To: References: Message-ID: > There are multiple bugs related to hsdis, calling both for added simplicity in building, and allowing for multiple backends. > > The very first step is getting rid of the stand-alone Makefile and integrate the build using standard build-infra tooling. > > This patch does this, and it also contains OOTB building on Windows (as requested in JDK-8208495, and furthermore it lays the foundation for adding more backends to hsdis. Magnus Ihse Bursie has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Fixes after code review remarks ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5908/files - new: https://git.openjdk.java.net/jdk/pull/5908/files/2bbeb63d..063c443d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5908&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5908&range=01-02 Stats: 3378 lines in 191 files changed: 881 ins; 1693 del; 804 mod Patch: https://git.openjdk.java.net/jdk/pull/5908.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5908/head:pull/5908 PR: https://git.openjdk.java.net/jdk/pull/5908 From redestad at openjdk.java.net Tue Oct 12 21:58:27 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 12 Oct 2021 21:58:27 GMT Subject: RFR: 8275128: Build hsdis using normal build system [v2] In-Reply-To: <1c4S9rMHgwt6r3gZhOX1Fg1XOg4pBnyO_BVdPo7dnAk=.dd5890fa-f989-405b-9e78-dffa1921ef37@github.com> References: <1c4S9rMHgwt6r3gZhOX1Fg1XOg4pBnyO_BVdPo7dnAk=.dd5890fa-f989-405b-9e78-dffa1921ef37@github.com> Message-ID: On Tue, 12 Oct 2021 21:20:53 GMT, Magnus Ihse Bursie wrote: >> There are multiple bugs related to hsdis, calling both for added simplicity in building, and allowing for multiple backends. >> >> The very first step is getting rid of the stand-alone Makefile and integrate the build using standard build-infra tooling. >> >> This patch does this, and it also contains OOTB building on Windows (as requested in JDK-8208495, and furthermore it lays the foundation for adding more backends to hsdis. > > Magnus Ihse Bursie has updated the pull request incrementally with 46 additional commits since the last revision: > > - 8274986: max code printed in hs-err logs should be configurable > > Reviewed-by: never, dholmes > - 8274615: Support relaxed atomic add for linux-aarch64 > > Reviewed-by: aph, dholmes > - 8275002: Remove unused AbstractStringBuilder.MAX_ARRAY_SIZE > > Reviewed-by: prappo, jlaskey, martin > - 8177814: jdk/editpad is not in jdk TEST.groups > > Reviewed-by: serb > - 8275031: runtime/ErrorHandling/MachCodeFramesInErrorFile.java fails when hsdis is present > > Reviewed-by: dholmes, dnsimon > - 8274560: JFR: Add test for OldObjectSample event when using Shenandoah > > Reviewed-by: mgronlun > - 8274466: G1: use field directly rather than method in G1CollectorState::in_mixed_phase > > Reviewed-by: ayang, sjohanss > - 8272167: AbsPathsInImage.java should skip *.dSYM directories > > Reviewed-by: ihse, erikj > - 8274945: Cleanup unnecessary calls to Throwable.initCause() in java.desktop > > Reviewed-by: jdv, serb, pbansal > - 8274925: Shenandoah: shenandoah/TestAllocHumongousFragment.java test failed on lock rank check > > Reviewed-by: shade > - ... and 36 more: https://git.openjdk.java.net/jdk/compare/48e11dd8...2bbeb63d Great to see this fixed! ------------- PR: https://git.openjdk.java.net/jdk/pull/5908 From ihse at openjdk.java.net Tue Oct 12 21:58:31 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Tue, 12 Oct 2021 21:58:31 GMT Subject: RFR: 8275128: Build hsdis using normal build system [v3] In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 12:41:40 GMT, Erik Joelsson wrote: >> Magnus Ihse Bursie has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Fixes after code review remarks > > make/Hsdis.gmk line 70: > >> 68: HSDIS_TOOLCHAIN_CFLAGS := >> 69: HSDIS_TOOLCHAIN_LDFLAGS := -L/usr/lib/gcc/$(MINGW_BASE)/9.2.0 -L/usr/$(MINGW_BASE)/sys-root/mingw/lib >> 70: HSDIS_TOOLCHAIN_LIBS := /usr/$(MINGW_BASE)/sys-root/mingw/lib/dllcrt2.o -lmingw32 -lgcc -lgcc_eh -lmoldname -lmingwex -lmsvcrt -lpthread -ladvapi32 -lshell32 -luser32 -lkernel32 > > Maybe break up this line a bit? Yes. I also extracted the dllcrt2.o file for added readability. > make/autoconf/jdk-options.m4 line 803: > >> 801: if test "x$with_hsdis" = xyes; then >> 802: AC_MSG_ERROR([--with-hsdis must have a value]) >> 803: elif test "x$with_hsdis" = xnone || test "x$with_hsdis" = x; then > > Should we accept "no" as value too so we can use --without-hsdis? Yeah, that's a good idea. ------------- PR: https://git.openjdk.java.net/jdk/pull/5908 From erikj at openjdk.java.net Tue Oct 12 22:17:46 2021 From: erikj at openjdk.java.net (Erik Joelsson) Date: Tue, 12 Oct 2021 22:17:46 GMT Subject: RFR: 8275128: Build hsdis using normal build system [v3] In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 21:58:23 GMT, Magnus Ihse Bursie wrote: >> There are multiple bugs related to hsdis, calling both for added simplicity in building, and allowing for multiple backends. >> >> The very first step is getting rid of the stand-alone Makefile and integrate the build using standard build-infra tooling. >> >> This patch does this, and it also contains OOTB building on Windows (as requested in JDK-8208495, and furthermore it lays the foundation for adding more backends to hsdis. > > Magnus Ihse Bursie has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Marked as reviewed by erikj (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5908 From ihse at openjdk.java.net Tue Oct 12 23:31:58 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Tue, 12 Oct 2021 23:31:58 GMT Subject: Integrated: 8275128: Build hsdis using normal build system In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 11:24:01 GMT, Magnus Ihse Bursie wrote: > There are multiple bugs related to hsdis, calling both for added simplicity in building, and allowing for multiple backends. > > The very first step is getting rid of the stand-alone Makefile and integrate the build using standard build-infra tooling. > > This patch does this, and it also contains OOTB building on Windows (as requested in JDK-8208495, and furthermore it lays the foundation for adding more backends to hsdis. This pull request has now been integrated. Changeset: 03c2b73e Author: Magnus Ihse Bursie URL: https://git.openjdk.java.net/jdk/commit/03c2b73e2112cdbcbd1230009de0a15a9bd31815 Stats: 668 lines in 8 files changed: 302 ins; 344 del; 22 mod 8275128: Build hsdis using normal build system Reviewed-by: erikj ------------- PR: https://git.openjdk.java.net/jdk/pull/5908 From dcubed at openjdk.java.net Tue Oct 12 23:48:07 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 12 Oct 2021 23:48:07 GMT Subject: Integrated: 8275171: ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 and windows-x64 in -Xcomp mode Message-ID: <7y1ci1dfiwWfrBKCFFB6auVOGuTzBJVFuItkCKO6H6I=.d00d539b-4560-4088-b9e9-5c917ed0ff2e@github.com> A trivial fix to ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 and windows-x64 in -Xcomp mode. ------------- Commit messages: - 8275171: ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 and windows-x64 in -Xcomp mode Changes: https://git.openjdk.java.net/jdk/pull/5919/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5919&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275171 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5919.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5919/head:pull/5919 PR: https://git.openjdk.java.net/jdk/pull/5919 From iignatyev at openjdk.java.net Tue Oct 12 23:48:07 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 12 Oct 2021 23:48:07 GMT Subject: Integrated: 8275171: ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 and windows-x64 in -Xcomp mode In-Reply-To: <7y1ci1dfiwWfrBKCFFB6auVOGuTzBJVFuItkCKO6H6I=.d00d539b-4560-4088-b9e9-5c917ed0ff2e@github.com> References: <7y1ci1dfiwWfrBKCFFB6auVOGuTzBJVFuItkCKO6H6I=.d00d539b-4560-4088-b9e9-5c917ed0ff2e@github.com> Message-ID: On Tue, 12 Oct 2021 23:29:27 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 > and windows-x64 in -Xcomp mode. Marked as reviewed by iignatyev (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5919 From mikael at openjdk.java.net Tue Oct 12 23:48:07 2021 From: mikael at openjdk.java.net (Mikael Vidstedt) Date: Tue, 12 Oct 2021 23:48:07 GMT Subject: Integrated: 8275171: ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 and windows-x64 in -Xcomp mode In-Reply-To: <7y1ci1dfiwWfrBKCFFB6auVOGuTzBJVFuItkCKO6H6I=.d00d539b-4560-4088-b9e9-5c917ed0ff2e@github.com> References: <7y1ci1dfiwWfrBKCFFB6auVOGuTzBJVFuItkCKO6H6I=.d00d539b-4560-4088-b9e9-5c917ed0ff2e@github.com> Message-ID: <5nwkeNRirKjZWZdB-YyWzDuiYFKpqRZ3ppTmafdVdNg=.d4eb1093-d3e0-4423-a3c6-3934052c8f1b@github.com> On Tue, 12 Oct 2021 23:29:27 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 > and windows-x64 in -Xcomp mode. Marked as reviewed by mikael (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5919 From dcubed at openjdk.java.net Tue Oct 12 23:48:08 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 12 Oct 2021 23:48:08 GMT Subject: Integrated: 8275171: ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 and windows-x64 in -Xcomp mode In-Reply-To: References: <7y1ci1dfiwWfrBKCFFB6auVOGuTzBJVFuItkCKO6H6I=.d00d539b-4560-4088-b9e9-5c917ed0ff2e@github.com> Message-ID: On Tue, 12 Oct 2021 23:40:52 GMT, Igor Ignatyev wrote: >> A trivial fix to ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 >> and windows-x64 in -Xcomp mode. > > Marked as reviewed by iignatyev (Reviewer). @iignatev - Thanks for the lightning fast review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5919 From iignatyev at openjdk.java.net Tue Oct 12 23:48:08 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 12 Oct 2021 23:48:08 GMT Subject: Integrated: 8275171: ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 and windows-x64 in -Xcomp mode In-Reply-To: <5nwkeNRirKjZWZdB-YyWzDuiYFKpqRZ3ppTmafdVdNg=.d4eb1093-d3e0-4423-a3c6-3934052c8f1b@github.com> References: <7y1ci1dfiwWfrBKCFFB6auVOGuTzBJVFuItkCKO6H6I=.d00d539b-4560-4088-b9e9-5c917ed0ff2e@github.com> <5nwkeNRirKjZWZdB-YyWzDuiYFKpqRZ3ppTmafdVdNg=.d4eb1093-d3e0-4423-a3c6-3934052c8f1b@github.com> Message-ID: <7WhiFlESdh4KsRURKfJoFt4c0t2qua42Y5RxcKJHVUg=.7b59082a-b62e-45bf-8c99-b5b5ca1f8aff@github.com> On Tue, 12 Oct 2021 23:43:49 GMT, Mikael Vidstedt wrote: >> A trivial fix to ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 >> and windows-x64 in -Xcomp mode. > > Marked as reviewed by mikael (Reviewer). @vidmik too slow :) ------------- PR: https://git.openjdk.java.net/jdk/pull/5919 From dcubed at openjdk.java.net Tue Oct 12 23:48:08 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 12 Oct 2021 23:48:08 GMT Subject: Integrated: 8275171: ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 and windows-x64 in -Xcomp mode In-Reply-To: <7y1ci1dfiwWfrBKCFFB6auVOGuTzBJVFuItkCKO6H6I=.d00d539b-4560-4088-b9e9-5c917ed0ff2e@github.com> References: <7y1ci1dfiwWfrBKCFFB6auVOGuTzBJVFuItkCKO6H6I=.d00d539b-4560-4088-b9e9-5c917ed0ff2e@github.com> Message-ID: On Tue, 12 Oct 2021 23:29:27 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 > and windows-x64 in -Xcomp mode. This pull request has now been integrated. Changeset: b1b83500 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/b1b83500a9c3a74bf39894e49eefd031d208b9b9 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8275171: ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 and windows-x64 in -Xcomp mode Reviewed-by: iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/5919 From dcubed at openjdk.java.net Tue Oct 12 23:59:51 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 12 Oct 2021 23:59:51 GMT Subject: Integrated: 8275171: ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 and windows-x64 in -Xcomp mode In-Reply-To: <5nwkeNRirKjZWZdB-YyWzDuiYFKpqRZ3ppTmafdVdNg=.d4eb1093-d3e0-4423-a3c6-3934052c8f1b@github.com> References: <7y1ci1dfiwWfrBKCFFB6auVOGuTzBJVFuItkCKO6H6I=.d00d539b-4560-4088-b9e9-5c917ed0ff2e@github.com> <5nwkeNRirKjZWZdB-YyWzDuiYFKpqRZ3ppTmafdVdNg=.d4eb1093-d3e0-4423-a3c6-3934052c8f1b@github.com> Message-ID: <1v7HGJvKOjHP_BBOoYeIhS8LW4ijyLbcD9c97XYsJf8=.56b1c4e1-7aae-4ee6-b5ea-4f55be946e94@github.com> On Tue, 12 Oct 2021 23:43:49 GMT, Mikael Vidstedt wrote: >> A trivial fix to ProblemList compiler/codegen/aes/TestAESMain.java on linux-x64 >> and windows-x64 in -Xcomp mode. > > Marked as reviewed by mikael (Reviewer). @vidmik - Thanks for the review, but I integrated too quickly... ------------- PR: https://git.openjdk.java.net/jdk/pull/5919 From ihse at openjdk.java.net Wed Oct 13 00:11:01 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Wed, 13 Oct 2021 00:11:01 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis Message-ID: This patch expands the newly added system for hsdis backends to include LLVM. The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. ------------- Commit messages: - Create hsdis backend using LLVM Changes: https://git.openjdk.java.net/jdk/pull/5920/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5920&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253757 Stats: 406 lines in 6 files changed: 398 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/5920.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5920/head:pull/5920 PR: https://git.openjdk.java.net/jdk/pull/5920 From jvernee at openjdk.java.net Wed Oct 13 00:34:48 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Wed, 13 Oct 2021 00:34:48 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: <-kktvlQWjVyTNwWTiTCVyW3WA1xiM5P6dcWp96WkD7k=.fc41674b-e558-4718-b820-3c60161eccc6@github.com> On Wed, 13 Oct 2021 00:00:22 GMT, Magnus Ihse Bursie wrote: > but it caused hotspot to segfault in LoadLibrary (!) in os::dll_load when I tried to load the library. I tried compiling the binutils-based hsdis earlier as well, but on WSL instead of cygwin (using the `mingw-w64` package), and ran into the same issue. It kept segfaulting when loading the library. My guess was that it is a problem caused by mixing libraries that are compiled with different toolchains, as the JDK itself is compiled with MSVC. AFAIK binutils can only be built with mingw (based on my earlier experiments), but LLVM can be built with MSVC as well, so maybe the regular MSVC toolchain could be used to build the llvm-based hsdis. ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From jiefu at openjdk.java.net Wed Oct 13 01:33:59 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 13 Oct 2021 01:33:59 GMT Subject: RFR: 8275173: testlibrary_tests/ir_framework/tests/TestCheckedTests.java fails after JDK-8274911 Message-ID: Hi all, May I get reviews for this change? The fix just follows what is done for test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestRunTests.java in JDK-8274911. Thanks. Best regards, Jie ------------- Commit messages: - 8275173: testlibrary_tests/ir_framework/tests/TestCheckedTests.java fails after JDK-8274911 Changes: https://git.openjdk.java.net/jdk/pull/5921/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5921&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275173 Stats: 12 lines in 1 file changed: 10 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5921.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5921/head:pull/5921 PR: https://git.openjdk.java.net/jdk/pull/5921 From chagedorn at openjdk.java.net Wed Oct 13 06:38:47 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 13 Oct 2021 06:38:47 GMT Subject: RFR: 8275173: testlibrary_tests/ir_framework/tests/TestCheckedTests.java fails after JDK-8274911 In-Reply-To: References: Message-ID: On Wed, 13 Oct 2021 01:17:08 GMT, Jie Fu wrote: > Hi all, > > May I get reviews for this change? > > The fix just follows what is done for test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestRunTests.java in JDK-8274911. > > Thanks. > Best regards, > Jie Looks good and trivial! Thanks for fixing that - I forgot that test because my repo was outdated ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5921 From thartmann at openjdk.java.net Wed Oct 13 06:50:49 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 13 Oct 2021 06:50:49 GMT Subject: RFR: 8275173: testlibrary_tests/ir_framework/tests/TestCheckedTests.java fails after JDK-8274911 In-Reply-To: References: Message-ID: <-7is4_GRToNxACLDWOWKaFEZF4E9Whgc6K-x5dUasMA=.79989eef-7a5a-4c5c-9743-201739e4945d@github.com> On Wed, 13 Oct 2021 01:17:08 GMT, Jie Fu wrote: > Hi all, > > May I get reviews for this change? > > The fix just follows what is done for test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestRunTests.java in JDK-8274911. > > Thanks. > Best regards, > Jie Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5921 From luhenry at openjdk.java.net Wed Oct 13 07:29:52 2021 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Wed, 13 Oct 2021 07:29:52 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Wed, 13 Oct 2021 00:00:22 GMT, Magnus Ihse Bursie wrote: > This patch expands the newly added system for hsdis backends to include LLVM. > > The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) > > Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. > > The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. Very happy to see it landing :) Thank you! I don't have access to a windows machine, and even less a Windows-AArch64 machine. @lewurm would you be able to take a look? ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From roland at openjdk.java.net Wed Oct 13 08:47:51 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 13 Oct 2021 08:47:51 GMT Subject: RFR: 8268744: Improve sinking algorithm in partial peeling to avoid redundant clones In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 15:46:13 GMT, Christian Hagedorn wrote: > The algorithm in step 2 in partial peeling to move data nodes from the peel section to the non-peel section uses a straight forward cloning algorithm which creates redundant clones when the IR contains one ore more diamonds of data nodes to be cloned. The number of clones grows exponentially which could lead to a bailout (added by [JDK-8256934](https://bugs.openjdk.java.net/browse/JDK-8256934) for JDK 17 with a testcase). This RFE improves this algorithm to handle node diamonds more efficiently to avoid unnecessary cloning. The testcase for JDK-8256934 does not bail out anymore and uses only few clones. > > The main idea is to first find all outside of the loop uses `u` of the nodes in the initial peel region to be moved into the non-peel region. We then only need to clone any data node during the algorithm at most `u` times, once for each initial outside of the loop use. If we process a node diamond (following inputs), we can use an already cloned node for the top node of the diamond (node A in the example below). An example with 1 initial outside of the loop use and 4 nodes to be cloned, forming a diamond, is shown as comment in the code: > https://github.com/openjdk/jdk/blob/8ae0e1a06558a1678521dcb4ed32708a1821b47d/src/hotspot/share/opto/loopopts.cpp#L3605-L3635 > > The algorithm is explained in more details in the comments in the code (starting in method `move_nodes_to_not_peel()`). > > I also cleaned up the code for step 2 of partial peeling. I left the bailout code added by JDK-8256934 in place which I think is still required if we enter partial peeling with a huge number of live nodes (quite rare though). > > I additionally ran some standard benchmarks which did not show any improvements but also no regressions. I think it is rather an edge case where the old algorithm creates a huge number of redundant clones. Nevertheless, I think this improved algorithm is still worth to have to handle the more uncommon case of node diamonds. What do you think? > > Thanks, > Christian Can you explain, maybe with an example, how PhaseIdealLoop::get_clone_for_outside_use() works? ------------- PR: https://git.openjdk.java.net/jdk/pull/4923 From roland at openjdk.java.net Wed Oct 13 08:51:19 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 13 Oct 2021 08:51:19 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v7] In-Reply-To: References: Message-ID: > JDK-8255150 makes it possible for java code to explicitly perform a > range check on long values. JDK-8223051 provides a transformation of > long counted loops into loop nests with an inner int counted > loop. With this change I propose transforming long range checks that > operate on the iv of a long counted loop into range checks that > operate on the iv of the int inner loop once it has been > created. Existing range check eliminations can then kick in. > > Transformation of range checks is piggy backed on the loop nest > creation for 2 reasons: > > - pattern matching range checks is easier right before the loop nest > is created > > - the number of iterations of the inner loop is adjusted so scale * > inner_iv doesn't overflow > > C2 has logic to delay some split if transformations so they don't > break the scale * iv + offset pattern. I reused that logic for long > range checks and had to relax what's considered a range check because > initially a range check from Object.checkIndex() may include a test > for range > 0 that needs a round of loop opts to be hoisted. I realize > there's some code duplication but I didn't see a way to share logic > between IdealLoopTree::may_have_range_check() > IdealLoopTree::policy_range_check() that would feel right. > > I realize the comment in PhaseIdealLoop::transform_long_range_checks() > is scary. FWIW, it's not as complicated as it looks. I found drawing > the range covered by the entire long loop and the range covered by the > inner loop help see how range checks can be transformed. Then the > comment helps make sure all cases are covered and verify the generated > code actually covers all of them. > > One issue is overflow. I think the fact that inner_iv * scale doesn't > overflow helps simplify thing. One possible overflow is that of scale > * upper + offset which is handled by forcing all range checks in that > case to deoptimize. I don't think other case of overflow needs special > handling. > > This was tested with a Memory Segment micro benchmark (and patched > Memory Segment support to take advantage of the new checkIndex > intrinsic, both provided by Maurizio). Range checks in the micro > benchmark are properly optimized (and performance increases > significantly). Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - merge - Merge branch 'master' into JDK-8259609 - whitespace - rework - Merge branch 'master' into JDK-8259609 - John's review 1 - merge with master - Tobias' comments - Merge branch 'master' into JDK-8259609 - min_jint overflow fix - ... and 6 more: https://git.openjdk.java.net/jdk/compare/ec199072...903f67d9 ------------- Changes: https://git.openjdk.java.net/jdk/pull/2045/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2045&range=06 Stats: 876 lines in 12 files changed: 701 ins; 67 del; 108 mod Patch: https://git.openjdk.java.net/jdk/pull/2045.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2045/head:pull/2045 PR: https://git.openjdk.java.net/jdk/pull/2045 From chagedorn at openjdk.java.net Wed Oct 13 09:30:09 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 13 Oct 2021 09:30:09 GMT Subject: RFR: 8262912: ciReplay: replay does not simulate unresolved classes Message-ID: <1_zhAUvJidCFwD_VCEakIEPf0jpaXFyQtcF1t5XvCxE=.0bfa6362-254d-463d-8359-8f9dc851214e@github.com> When trying to replay compile, the JVM will always resolve some classes before actually doing the replay compilation. When finally replay compiling the method, the state of `ciInstanceKlasses` which are resolved/unresolved could be different compared to the state at which the replay file was dumped. This will even be a bigger problem when tackling [JDK-8254110](https://bugs.openjdk.java.net/browse/JDK-8254110). This change intends to fix this by only treating a `ciInstanceKlass` as *not* unresolved if there is a corresponding entry for it in the replay file. This is achieved by a whitelist (`ciInstanceKlassRecord`). All accesses to get a pointer to a `ciInstanceKlass` are eventually routed through `ciEnv::get_metadata()`. This method is hooked to compare it against the replay compilation whitelist. If the corresponding `Klass` is not on the list, an unresolved `ciInstanceKlass` is returned instead. Finding a way to reliably test this feature was difficult. I therefore came up with a test which first creates a replay file with `CICrashAt` and then removes the `ciInstanceKlass` entry for class `Foo` to simulate that `Foo` was unresolved at replay dump time. This will result in a different C2 IR which is verified by checking the `PrintIdeal` output (see comments in test). Thanks, Christian ------------- Commit messages: - 8262912: ciReplay: replay does not simulate unresolved classes Changes: https://git.openjdk.java.net/jdk/pull/5926/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5926&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262912 Stats: 241 lines in 8 files changed: 226 ins; 0 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/5926.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5926/head:pull/5926 PR: https://git.openjdk.java.net/jdk/pull/5926 From ihse at openjdk.java.net Wed Oct 13 12:55:53 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Wed, 13 Oct 2021 12:55:53 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: <-kktvlQWjVyTNwWTiTCVyW3WA1xiM5P6dcWp96WkD7k=.fc41674b-e558-4718-b820-3c60161eccc6@github.com> References: <-kktvlQWjVyTNwWTiTCVyW3WA1xiM5P6dcWp96WkD7k=.fc41674b-e558-4718-b820-3c60161eccc6@github.com> Message-ID: On Wed, 13 Oct 2021 00:32:03 GMT, Jorn Vernee wrote: >> This patch expands the newly added system for hsdis backends to include LLVM. >> >> The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) >> >> Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. >> >> The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. > >> but it caused hotspot to segfault in LoadLibrary (!) in os::dll_load when I tried to load the library. > > I tried compiling the binutils-based hsdis earlier as well, but on WSL instead of cygwin (using the `mingw-w64` package), and ran into the same issue. It kept segfaulting when loading the library. > > My guess was that it is a problem caused by mixing libraries that are compiled with different toolchains, as the JDK itself is compiled with MSVC. > > AFAIK binutils can only be built with mingw (based on my earlier experiments), but LLVM can be built with MSVC as well, so maybe the regular MSVC toolchain could be used to build the llvm-based hsdis. @JornVernee It is likely that the LLVM-based backend can be build by MSVC, yes. I did not explore that further in this patch. I suggest that the way forward is to get this patch into mainline, and then experiment with how to get Windows support working properly. (The main problem with the MSVS approach is that the LLVM libraries, as returned by llvm-config, is in gcc format (`-lLLFoobarnicator`) which we can't send to CL. (It seems Ludovic tried to work around this by transforming the command line in his original PR.) ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From erikj at openjdk.java.net Wed Oct 13 13:02:52 2021 From: erikj at openjdk.java.net (Erik Joelsson) Date: Wed, 13 Oct 2021 13:02:52 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: <6qvBXCAckRpSbjiyxx-6sBqrnNbe40Jzk1k7M5rV3Mk=.5daf2183-97c6-40c3-ac25-d7232153c0ad@github.com> On Wed, 13 Oct 2021 00:00:22 GMT, Magnus Ihse Bursie wrote: > This patch expands the newly added system for hsdis backends to include LLVM. > > The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) > > Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. > > The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. Marked as reviewed by erikj (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From jvernee at openjdk.java.net Wed Oct 13 13:10:51 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Wed, 13 Oct 2021 13:10:51 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: <0iaKIK8JsgkNsRkAqcIoJzxGJiMiHRQfN8kGIvOWOUg=.b07bcb04-f8ce-44af-8e45-f161aea805ba@github.com> On Wed, 13 Oct 2021 00:00:22 GMT, Magnus Ihse Bursie wrote: > This patch expands the newly added system for hsdis backends to include LLVM. > > The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) > > Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. > > The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. In my experience the output of llvm-config is also not usable. I think the output also depends on the toolchain you use to build llvm FWIW. The output of my locally built llvm-config does contain the MSVC flags, but the paths it points to are incorrect (all pointing to the build directory, instead of the package install location). I have a patch here that gets me a working hsdis based on the llvm package I built manually using MSVC (the [official package](https://github.com/llvm/llvm-project/releases/download/llvmorg-13.0.0/LLVM-13.0.0-win64.exe) doesn't seem to contain the needed header files): https://github.com/openjdk/jdk/compare/pr/5920...JornVernee:hsdis_llvm_windows (The only issue currently is that the code I used to filter out the incorrect `-I` flags from what llvm-config gives me doesn't seem to work, though the build still passes). I built llvm using something like this (according to my notes): git clone https://github.com/llvm/llvm-project.git cd llvm-project mkdir build_llvm cd build_llvm cmake ../llvm -D"LLVM_TARGETS_TO_BUILD:STRING=X86" -D"CMAKE_BUILD_TYPE:STRING=Release" -D"CMAKE_INSTALL_PREFIX=install_local" -A x64 -T host=x64 cmake --build . --config Release --target install This then uses MSVC to build me an llvm 'package' in `build_llvm/install_local`, which I then point to using `--with-llvm`. The only other issue I had is that `install-hsdis` only copies the library to the exploded JDK, so I manually copy it to `images/jdk/bin/server` afterwards. HTH ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From roland at openjdk.java.net Wed Oct 13 13:42:48 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 13 Oct 2021 13:42:48 GMT Subject: RFR: JDK-8251513: Code around Parse::do_lookupswitch/do_tableswitch should be cleaned up In-Reply-To: <4xMeWOd5dV5wtEPDp1UuLCLJDKcj9seD9_IpiXS8PrA=.f172e00f-7060-40d7-a52a-f94428c27083@github.com> References: <4xMeWOd5dV5wtEPDp1UuLCLJDKcj9seD9_IpiXS8PrA=.f172e00f-7060-40d7-a52a-f94428c27083@github.com> Message-ID: On Wed, 6 Oct 2021 09:27:15 GMT, Tobias Holenstein wrote: > - `default_cnt` can be computed without using a loop: > > An example of how `defaults` was computed before at parse2.cpp:521-533 with switch labels `-10`, `0`, `10`, `42` and `200`: > > defaults = 0 > defaults += -10 - (-2147483648) > defaults += 0 - (-10 + 1) > defaults += 10 - (0 + 1) > defaults += 42 - (10 + 1) > defaults += 200 - (42 + 1) > defaults += 2147483647 - (200 + 1) + 1 > > => `defaults` = > -10 - (-2147483648) + 0 - (-10 + 1) + 10 - (0 + 1) + 42 - (10 + 1) + 200 - (42 + 1) + 2147483647 - (200 + 1) + 1 = > 4294967291 = 2147483648 + 2147483648 - 5 > BUT actually `defaults` was : `defaults` = 2147483648 + 2147483648 > > The reason has to do with using floats: > ((float)match_int - (float)prev) == (-(float)prev) is True > for match_int=-10, prev=-2147483648 > > BUT actually `defaults` (2147483648 + 2147483648 - 5) can also be computed without using a loop with `juint defaults = max_juint - len` > > > - also made some casts explicit > > - A lot of casts could be avoided by making `_cnt` in `SwitchRange` a uint. Unfortunately, the Range for the default values of a switch in `do_lookupswitch` calculates the count by scaling the average cnt/label up to cnt/range which needs a float to store an accurate result That looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5837 From aph at openjdk.java.net Wed Oct 13 13:57:53 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 13 Oct 2021 13:57:53 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Wed, 13 Oct 2021 00:00:22 GMT, Magnus Ihse Bursie wrote: > This patch expands the newly added system for hsdis backends to include LLVM. > > The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) > > Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. > > The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. > This patch expands the newly added system for hsdis backends to include LLVM. > > The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR #392. (I have basically just ripped out the binutils-based part of it.) > > Unfortunately I have not been able to make this work properly on Windows. What is it for, then? hsdis builds on AArch64-MacOS-LLVM with cd src/utils/hsdis mkdir build cd build git clone https://github.com/bminor/binutils-gdb ln -s binutils-gdb binutils cd .. make ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From dcubed at openjdk.java.net Wed Oct 13 14:33:51 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 13 Oct 2021 14:33:51 GMT Subject: RFR: 8275173: testlibrary_tests/ir_framework/tests/TestCheckedTests.java fails after JDK-8274911 In-Reply-To: References: Message-ID: <3ZIuF6RJDbtmhEo1lZqv5Gk6rmaYJjC4na-TpAQDnbA=.e293d91c-43fa-4eea-9b9f-dc0f250e032f@github.com> On Wed, 13 Oct 2021 01:17:08 GMT, Jie Fu wrote: > Hi all, > > May I get reviews for this change? > > The fix just follows what is done for test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestRunTests.java in JDK-8274911. > > Thanks. > Best regards, > Jie Since this bug is causing 4 test failures per Tier5, please do not wait for 24 hours before pushing this fix. ------------- PR: https://git.openjdk.java.net/jdk/pull/5921 From jiefu at openjdk.java.net Wed Oct 13 14:33:51 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 13 Oct 2021 14:33:51 GMT Subject: RFR: 8275173: testlibrary_tests/ir_framework/tests/TestCheckedTests.java fails after JDK-8274911 In-Reply-To: <3ZIuF6RJDbtmhEo1lZqv5Gk6rmaYJjC4na-TpAQDnbA=.e293d91c-43fa-4eea-9b9f-dc0f250e032f@github.com> References: <3ZIuF6RJDbtmhEo1lZqv5Gk6rmaYJjC4na-TpAQDnbA=.e293d91c-43fa-4eea-9b9f-dc0f250e032f@github.com> Message-ID: On Wed, 13 Oct 2021 14:27:51 GMT, Daniel D. Daugherty wrote: > Since this bug is causing 4 test failures per Tier5, please do not wait for 24 hours before pushing this fix. Okay. Thanks @chhagedorn and @TobiHartmann for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5921 From jiefu at openjdk.java.net Wed Oct 13 14:33:51 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 13 Oct 2021 14:33:51 GMT Subject: Integrated: 8275173: testlibrary_tests/ir_framework/tests/TestCheckedTests.java fails after JDK-8274911 In-Reply-To: References: Message-ID: On Wed, 13 Oct 2021 01:17:08 GMT, Jie Fu wrote: > Hi all, > > May I get reviews for this change? > > The fix just follows what is done for test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestRunTests.java in JDK-8274911. > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: 451a2965 Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/451a296510994ff9fe1e0381900ffa9a8a1caa54 Stats: 12 lines in 1 file changed: 10 ins; 0 del; 2 mod 8275173: testlibrary_tests/ir_framework/tests/TestCheckedTests.java fails after JDK-8274911 Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5921 From dcubed at openjdk.java.net Wed Oct 13 14:45:53 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 13 Oct 2021 14:45:53 GMT Subject: RFR: 8275173: testlibrary_tests/ir_framework/tests/TestCheckedTests.java fails after JDK-8274911 In-Reply-To: References: <3ZIuF6RJDbtmhEo1lZqv5Gk6rmaYJjC4na-TpAQDnbA=.e293d91c-43fa-4eea-9b9f-dc0f250e032f@github.com> Message-ID: <3R1aAblWJZXM_DDyLCD0rPRAFxgjzYoUtZkUbUjpuJk=.54829b5f-dded-4826-adf2-fcc0d239b990@github.com> On Wed, 13 Oct 2021 14:29:14 GMT, Jie Fu wrote: >> Since this bug is causing 4 test failures per Tier5, please do not wait for 24 hours >> before pushing this fix. > >> Since this bug is causing 4 test failures per Tier5, please do not wait for 24 hours before pushing this fix. > > Okay. > Thanks @chhagedorn and @TobiHartmann for your review. @DamonFool - Just realized I forgot to ask what testing you did... ------------- PR: https://git.openjdk.java.net/jdk/pull/5921 From jiefu at openjdk.java.net Wed Oct 13 14:45:53 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 13 Oct 2021 14:45:53 GMT Subject: RFR: 8275173: testlibrary_tests/ir_framework/tests/TestCheckedTests.java fails after JDK-8274911 In-Reply-To: References: <3ZIuF6RJDbtmhEo1lZqv5Gk6rmaYJjC4na-TpAQDnbA=.e293d91c-43fa-4eea-9b9f-dc0f250e032f@github.com> Message-ID: On Wed, 13 Oct 2021 14:29:14 GMT, Jie Fu wrote: >> Since this bug is causing 4 test failures per Tier5, please do not wait for 24 hours >> before pushing this fix. > >> Since this bug is causing 4 test failures per Tier5, please do not wait for 24 hours before pushing this fix. > > Okay. > Thanks @chhagedorn and @TobiHartmann for your review. > @DamonFool - Just realized I forgot to ask what testing you did... I tested the affected test on my mac. ------------- PR: https://git.openjdk.java.net/jdk/pull/5921 From ihse at openjdk.java.net Wed Oct 13 15:08:48 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Wed, 13 Oct 2021 15:08:48 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Wed, 13 Oct 2021 13:55:04 GMT, Andrew Haley wrote: >> This patch expands the newly added system for hsdis backends to include LLVM. >> >> The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) >> >> Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. >> >> The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. > >> This patch expands the newly added system for hsdis backends to include LLVM. >> >> The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR #392. (I have basically just ripped out the binutils-based part of it.) >> >> Unfortunately I have not been able to make this work properly on Windows. > > What is it for, then? > > hsdis builds on AArch64-MacOS-LLVM with > > > cd src/utils/hsdis > mkdir build > cd build > git clone https://github.com/bminor/binutils-gdb > ln -s binutils-gdb binutils > cd .. > make @theRealAph As you might be aware, the licensing criteria for binutils makes it impossible to distribute a binutils-based hsdis with the JDK. While IANAL, my understanding is that the LLVM license is less problematic in that way. Also, this is to allow a bit of freedom of choice. If you prefer the LLVM backend (I've been told that it generates better disassembly in some case) you should be able to select it. And finally, I do think that the LLVM backend should be able to work on Windows, too. It's just that this is tricky enough to motivate doing this in a separate, later, step. ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From ihse at openjdk.java.net Wed Oct 13 15:14:47 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Wed, 13 Oct 2021 15:14:47 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: <0iaKIK8JsgkNsRkAqcIoJzxGJiMiHRQfN8kGIvOWOUg=.b07bcb04-f8ce-44af-8e45-f161aea805ba@github.com> References: <0iaKIK8JsgkNsRkAqcIoJzxGJiMiHRQfN8kGIvOWOUg=.b07bcb04-f8ce-44af-8e45-f161aea805ba@github.com> Message-ID: <2eRpiyPVuQyVLhnC_05bhGbYDEgT4ezSe7--epnSYp0=.6dbb0804-4297-449f-9004-cd17dcb7e7ba@github.com> On Wed, 13 Oct 2021 13:04:45 GMT, Jorn Vernee wrote: > The only other issue I had is that `install-hsdis` only copies the library to the exploded JDK, so I manually copy it to `images/jdk/bin/server` afterwards. @JornVernee Ah, good point! I need to make sure it gets copied to the image as well. And thank you for your patch, I'll have a look at it and see if I can incorporate it in this patch, or if it's better to make it a separate patch. It's worth noting that I used the llvm package from cygwin, neither the "official" package nor a self-built. I can imagine that the self-built works better, so it might be worth adding a --with-llvm-src option to help build it as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From luhenry at openjdk.java.net Wed Oct 13 15:41:59 2021 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Wed, 13 Oct 2021 15:41:59 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Wed, 13 Oct 2021 00:00:22 GMT, Magnus Ihse Bursie wrote: > This patch expands the newly added system for hsdis backends to include LLVM. > > The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) > > Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. > > The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. Marked as reviewed by luhenry (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From aph at openjdk.java.net Wed Oct 13 15:58:47 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 13 Oct 2021 15:58:47 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Wed, 13 Oct 2021 13:55:04 GMT, Andrew Haley wrote: >> This patch expands the newly added system for hsdis backends to include LLVM. >> >> The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) >> >> Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. >> >> The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. > >> This patch expands the newly added system for hsdis backends to include LLVM. >> >> The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR #392. (I have basically just ripped out the binutils-based part of it.) >> >> Unfortunately I have not been able to make this work properly on Windows. > > What is it for, then? > > hsdis builds on AArch64-MacOS-LLVM with > > > cd src/utils/hsdis > mkdir build > cd build > git clone https://github.com/bminor/binutils-gdb > ln -s binutils-gdb binutils > cd .. > make > @theRealAph As you might be aware, the licensing criteria for binutils makes it impossible to distribute a binutils-based hsdis with the JDK. While IANAL, my understanding is that the LLVM license is less problematic in that way. > > Also, this is to allow a bit of freedom of choice. If you prefer the LLVM backend (I've been told that it generates better disassembly in some case) you should be able to select it. > > And finally, I do think that the LLVM backend should be able to work on Windows, too. It's just that this is tricky enough to motivate doing this in a separate, later, step. OK, but how do you build it? I have applied this patch, and the instructions in `hsdis/README` don't mention LLVM, just binutils. Shouldn't the instructions have been updated? ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From kvn at openjdk.java.net Wed Oct 13 16:04:47 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 13 Oct 2021 16:04:47 GMT Subject: RFR: 8262912: ciReplay: replay does not simulate unresolved classes In-Reply-To: <1_zhAUvJidCFwD_VCEakIEPf0jpaXFyQtcF1t5XvCxE=.0bfa6362-254d-463d-8359-8f9dc851214e@github.com> References: <1_zhAUvJidCFwD_VCEakIEPf0jpaXFyQtcF1t5XvCxE=.0bfa6362-254d-463d-8359-8f9dc851214e@github.com> Message-ID: On Wed, 13 Oct 2021 09:21:18 GMT, Christian Hagedorn wrote: > When trying to replay compile, the JVM will always resolve some classes before actually doing the replay compilation. When finally replay compiling the method, the state of `ciInstanceKlasses` which are resolved/unresolved could be different compared to the state at which the replay file was dumped. This will even be a bigger problem when tackling [JDK-8254110](https://bugs.openjdk.java.net/browse/JDK-8254110). > > This change intends to fix this by only treating a `ciInstanceKlass` as *not* unresolved if there is a corresponding entry for it in the replay file. This is achieved by a whitelist (`ciInstanceKlassRecord`). All accesses to get a pointer to a `ciInstanceKlass` are eventually routed through `ciEnv::get_metadata()`. This method is hooked to compare it against the replay compilation whitelist. If the corresponding `Klass` is not on the list, an unresolved `ciInstanceKlass` is returned instead. > > Finding a way to reliably test this feature was difficult. I therefore came up with a test which first creates a replay file with `CICrashAt` and then removes the `ciInstanceKlass` entry for class `Foo` to simulate that `Foo` was unresolved at replay dump time. This will result in a different C2 IR which is verified by checking the `PrintIdeal` output (see comments in test). > > Thanks, > Christian Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5926 From dlong at openjdk.java.net Wed Oct 13 21:19:56 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 13 Oct 2021 21:19:56 GMT Subject: RFR: 8262912: ciReplay: replay does not simulate unresolved classes In-Reply-To: <1_zhAUvJidCFwD_VCEakIEPf0jpaXFyQtcF1t5XvCxE=.0bfa6362-254d-463d-8359-8f9dc851214e@github.com> References: <1_zhAUvJidCFwD_VCEakIEPf0jpaXFyQtcF1t5XvCxE=.0bfa6362-254d-463d-8359-8f9dc851214e@github.com> Message-ID: On Wed, 13 Oct 2021 09:21:18 GMT, Christian Hagedorn wrote: > When trying to replay compile, the JVM will always resolve some classes before actually doing the replay compilation. When finally replay compiling the method, the state of `ciInstanceKlasses` which are resolved/unresolved could be different compared to the state at which the replay file was dumped. This will even be a bigger problem when tackling [JDK-8254110](https://bugs.openjdk.java.net/browse/JDK-8254110). > > This change intends to fix this by only treating a `ciInstanceKlass` as *not* unresolved if there is a corresponding entry for it in the replay file. This is achieved by a whitelist (`ciInstanceKlassRecord`). All accesses to get a pointer to a `ciInstanceKlass` are eventually routed through `ciEnv::get_metadata()`. This method is hooked to compare it against the replay compilation whitelist. If the corresponding `Klass` is not on the list, an unresolved `ciInstanceKlass` is returned instead. > > Finding a way to reliably test this feature was difficult. I therefore came up with a test which first creates a replay file with `CICrashAt` and then removes the `ciInstanceKlass` entry for class `Foo` to simulate that `Foo` was unresolved at replay dump time. This will result in a different C2 IR which is verified by checking the `PrintIdeal` output (see comments in test). > > Thanks, > Christian Looks good. There's a memory leak with the global mirror jobjects that will need to be fixed for JDK-8254110. One way to do this might be to prune your replay white list as klasses are added to _ci_metadata. That would mean moving your white list check until we check for existing klasses in _ci_metadata. Also, you could reduce the number of JNI global refs needed by putting the mirror objects in an array, then storing the array in a single JNI global ref, or adding it to what gets scanned by CompileTask::metadata_do or ciEnv::metadata_do, but I think that can wait until JDK-8254110 too. ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5926 From duke at openjdk.java.net Wed Oct 13 22:30:07 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Wed, 13 Oct 2021 22:30:07 GMT Subject: RFR: 8269559: AArch64: Implement string_compare intrinsic in SVE [v3] In-Reply-To: References: Message-ID: > This patch implements string_compare intrinsic in SVE. > It supports all LL, LU, UL and UU comparisons. > > As we haven't found an existing benchmark to measure performance impact, > we created a benchmark derived from the test [1] for this evaluation. > This benchmark is attached to this patch. > > Besides, remove the unused temporary register `vtmp3` from the existing > match rules for StrCmp. > > The result below shows all varients can be benefited largely. > Command: make exploded-test TEST="micro:StringCompareToDifferentLength" > > Benchmark (size) Mode Cnt Score Speedup Units > compareToLL 24 avgt 10 1.0x ms/op > compareToLL 36 avgt 10 1.0x ms/op > compareToLL 72 avgt 10 1.0x ms/op > compareToLL 128 avgt 10 1.4x ms/op > compareToLL 256 avgt 10 1.8x ms/op > compareToLL 512 avgt 10 2.7x ms/op > compareToLU 24 avgt 10 1.6x ms/op > compareToLU 36 avgt 10 1.8x ms/op > compareToLU 72 avgt 10 2.3x ms/op > compareToLU 128 avgt 10 3.8x ms/op > compareToLU 256 avgt 10 4.7x ms/op > compareToLU 512 avgt 10 6.3x ms/op > compareToUL 24 avgt 10 1.6x ms/op > compareToUL 36 avgt 10 1.7x ms/op > compareToUL 72 avgt 10 2.2x ms/op > compareToUL 128 avgt 10 3.3x ms/op > compareToUL 256 avgt 10 4.4x ms/op > compareToUL 512 avgt 10 6.1x ms/op > compareToUU 24 avgt 10 1.0x ms/op > compareToUU 36 avgt 10 1.0x ms/op > compareToUU 72 avgt 10 1.4x ms/op > compareToUU 128 avgt 10 2.2x ms/op > compareToUU 256 avgt 10 2.6x ms/op > compareToUU 512 avgt 10 3.7x ms/op > > [1] https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToDifferentLength.java TatWai Chong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge master - Restore the removal of vtmp3 (=V2) as it is still used by the non-SVE compare-long-strings stub. And remove the assertion in `string_compare` since it won't help as the registers used in the stub are fixed. - 8269559: AArch64: Implement string_compare intrinsic in SVE This patch implements string_compare intrinsic in SVE. It supports all LL, LU, UL and UU comparisons. As we haven't found an existing benchmark to measure performance impact, we created a benchmark derived from the test [1] for this evaluation. This benchmark is attached to this patch. Besides, remove the unused temporary register `vtmp3` from the existing match rules for StrCmp. The result below shows all varients can be benefited largely. Command: make exploded-test TEST="micro:StringCompareToDifferentLength" Benchmark (size) Mode Cnt Score Speedup Units compareToLL 24 avgt 10 1.0x ms/op compareToLL 36 avgt 10 1.0x ms/op compareToLL 72 avgt 10 1.0x ms/op compareToLL 128 avgt 10 1.4x ms/op compareToLL 256 avgt 10 1.8x ms/op compareToLL 512 avgt 10 2.7x ms/op compareToLU 24 avgt 10 1.6x ms/op compareToLU 36 avgt 10 1.8x ms/op compareToLU 72 avgt 10 2.3x ms/op compareToLU 128 avgt 10 3.8x ms/op compareToLU 256 avgt 10 4.7x ms/op compareToLU 512 avgt 10 6.3x ms/op compareToUL 24 avgt 10 1.6x ms/op compareToUL 36 avgt 10 1.7x ms/op compareToUL 72 avgt 10 2.2x ms/op compareToUL 128 avgt 10 3.3x ms/op compareToUL 256 avgt 10 4.4x ms/op compareToUL 512 avgt 10 6.1x ms/op compareToUU 24 avgt 10 1.0x ms/op compareToUU 36 avgt 10 1.0x ms/op compareToUU 72 avgt 10 1.4x ms/op compareToUU 128 avgt 10 2.2x ms/op compareToUU 256 avgt 10 2.6x ms/op compareToUU 512 avgt 10 3.7x ms/op [1] https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToDifferentLength.java ------------- Changes: https://git.openjdk.java.net/jdk/pull/5129/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5129&range=02 Stats: 517 lines in 11 files changed: 421 ins; 1 del; 95 mod Patch: https://git.openjdk.java.net/jdk/pull/5129.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5129/head:pull/5129 PR: https://git.openjdk.java.net/jdk/pull/5129 From duke at openjdk.java.net Thu Oct 14 01:17:17 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Thu, 14 Oct 2021 01:17:17 GMT Subject: RFR: 8269559: AArch64: Implement string_compare intrinsic in SVE [v4] In-Reply-To: References: Message-ID: > This patch implements string_compare intrinsic in SVE. > It supports all LL, LU, UL and UU comparisons. > > As we haven't found an existing benchmark to measure performance impact, > we created a benchmark derived from the test [1] for this evaluation. > This benchmark is attached to this patch. > > Besides, remove the unused temporary register `vtmp3` from the existing > match rules for StrCmp. > > The result below shows all varients can be benefited largely. > Command: make exploded-test TEST="micro:StringCompareToDifferentLength" > > Benchmark (size) Mode Cnt Score Speedup Units > compareToLL 24 avgt 10 1.0x ms/op > compareToLL 36 avgt 10 1.0x ms/op > compareToLL 72 avgt 10 1.0x ms/op > compareToLL 128 avgt 10 1.4x ms/op > compareToLL 256 avgt 10 1.8x ms/op > compareToLL 512 avgt 10 2.7x ms/op > compareToLU 24 avgt 10 1.6x ms/op > compareToLU 36 avgt 10 1.8x ms/op > compareToLU 72 avgt 10 2.3x ms/op > compareToLU 128 avgt 10 3.8x ms/op > compareToLU 256 avgt 10 4.7x ms/op > compareToLU 512 avgt 10 6.3x ms/op > compareToUL 24 avgt 10 1.6x ms/op > compareToUL 36 avgt 10 1.7x ms/op > compareToUL 72 avgt 10 2.2x ms/op > compareToUL 128 avgt 10 3.3x ms/op > compareToUL 256 avgt 10 4.4x ms/op > compareToUL 512 avgt 10 6.1x ms/op > compareToUU 24 avgt 10 1.0x ms/op > compareToUU 36 avgt 10 1.0x ms/op > compareToUU 72 avgt 10 1.4x ms/op > compareToUU 128 avgt 10 2.2x ms/op > compareToUU 256 avgt 10 2.6x ms/op > compareToUU 512 avgt 10 3.7x ms/op > > [1] https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToDifferentLength.java TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: Replace `sve_cmpne` with up-to-date `sve_cmp`. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5129/files - new: https://git.openjdk.java.net/jdk/pull/5129/files/4a584089..7799f934 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5129&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5129&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5129.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5129/head:pull/5129 PR: https://git.openjdk.java.net/jdk/pull/5129 From eliu at openjdk.java.net Thu Oct 14 03:09:55 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Thu, 14 Oct 2021 03:09:55 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v8] In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 06:32:21 GMT, Wang Huang wrote: >> * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. >> * It may be a solver of JDK-8269866, or part of it. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > remove useless codes I triggered the test, hotspot_all(no vmTestBase stress), langtools:tier1, jdk:tier1, tier2, tier3 are passed on aarch64. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From duke at openjdk.java.net Thu Oct 14 05:30:54 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Thu, 14 Oct 2021 05:30:54 GMT Subject: Integrated: 8269559: AArch64: Implement string_compare intrinsic in SVE In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 20:59:55 GMT, TatWai Chong wrote: > This patch implements string_compare intrinsic in SVE. > It supports all LL, LU, UL and UU comparisons. > > As we haven't found an existing benchmark to measure performance impact, > we created a benchmark derived from the test [1] for this evaluation. > This benchmark is attached to this patch. > > Besides, remove the unused temporary register `vtmp3` from the existing > match rules for StrCmp. > > The result below shows all varients can be benefited largely. > Command: make exploded-test TEST="micro:StringCompareToDifferentLength" > > Benchmark (size) Mode Cnt Score Speedup Units > compareToLL 24 avgt 10 1.0x ms/op > compareToLL 36 avgt 10 1.0x ms/op > compareToLL 72 avgt 10 1.0x ms/op > compareToLL 128 avgt 10 1.4x ms/op > compareToLL 256 avgt 10 1.8x ms/op > compareToLL 512 avgt 10 2.7x ms/op > compareToLU 24 avgt 10 1.6x ms/op > compareToLU 36 avgt 10 1.8x ms/op > compareToLU 72 avgt 10 2.3x ms/op > compareToLU 128 avgt 10 3.8x ms/op > compareToLU 256 avgt 10 4.7x ms/op > compareToLU 512 avgt 10 6.3x ms/op > compareToUL 24 avgt 10 1.6x ms/op > compareToUL 36 avgt 10 1.7x ms/op > compareToUL 72 avgt 10 2.2x ms/op > compareToUL 128 avgt 10 3.3x ms/op > compareToUL 256 avgt 10 4.4x ms/op > compareToUL 512 avgt 10 6.1x ms/op > compareToUU 24 avgt 10 1.0x ms/op > compareToUU 36 avgt 10 1.0x ms/op > compareToUU 72 avgt 10 1.4x ms/op > compareToUU 128 avgt 10 2.2x ms/op > compareToUU 256 avgt 10 2.6x ms/op > compareToUU 512 avgt 10 3.7x ms/op > > [1] https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToDifferentLength.java This pull request has now been integrated. Changeset: 8b1b6f9f Author: TatWai Chong Committer: Nick Gasson URL: https://git.openjdk.java.net/jdk/commit/8b1b6f9fb375bbc2de339ad8f526ca4d5f83dc70 Stats: 517 lines in 11 files changed: 421 ins; 1 del; 95 mod 8269559: AArch64: Implement string_compare intrinsic in SVE Reviewed-by: ngasson, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/5129 From david.holmes at oracle.com Thu Oct 14 06:12:42 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 14 Oct 2021 16:12:42 +1000 Subject: Integrated: 8269559: AArch64: Implement string_compare intrinsic in SVE In-Reply-To: References: Message-ID: We are seeing a large number of Aarch64 test failures in our CI after this push. Somewhat bizarre failure modes: - truncated classfiles - unexpected EOF encountered - illegal state reading a stream I think we will need to back this change out while this is investigated further. David ----- On 14/10/2021 3:30 pm, TatWai Chong wrote: > On Mon, 16 Aug 2021 20:59:55 GMT, TatWai Chong wrote: > >> This patch implements string_compare intrinsic in SVE. >> It supports all LL, LU, UL and UU comparisons. >> >> As we haven't found an existing benchmark to measure performance impact, >> we created a benchmark derived from the test [1] for this evaluation. >> This benchmark is attached to this patch. >> >> Besides, remove the unused temporary register `vtmp3` from the existing >> match rules for StrCmp. >> >> The result below shows all varients can be benefited largely. >> Command: make exploded-test TEST="micro:StringCompareToDifferentLength" >> >> Benchmark (size) Mode Cnt Score Speedup Units >> compareToLL 24 avgt 10 1.0x ms/op >> compareToLL 36 avgt 10 1.0x ms/op >> compareToLL 72 avgt 10 1.0x ms/op >> compareToLL 128 avgt 10 1.4x ms/op >> compareToLL 256 avgt 10 1.8x ms/op >> compareToLL 512 avgt 10 2.7x ms/op >> compareToLU 24 avgt 10 1.6x ms/op >> compareToLU 36 avgt 10 1.8x ms/op >> compareToLU 72 avgt 10 2.3x ms/op >> compareToLU 128 avgt 10 3.8x ms/op >> compareToLU 256 avgt 10 4.7x ms/op >> compareToLU 512 avgt 10 6.3x ms/op >> compareToUL 24 avgt 10 1.6x ms/op >> compareToUL 36 avgt 10 1.7x ms/op >> compareToUL 72 avgt 10 2.2x ms/op >> compareToUL 128 avgt 10 3.3x ms/op >> compareToUL 256 avgt 10 4.4x ms/op >> compareToUL 512 avgt 10 6.1x ms/op >> compareToUU 24 avgt 10 1.0x ms/op >> compareToUU 36 avgt 10 1.0x ms/op >> compareToUU 72 avgt 10 1.4x ms/op >> compareToUU 128 avgt 10 2.2x ms/op >> compareToUU 256 avgt 10 2.6x ms/op >> compareToUU 512 avgt 10 3.7x ms/op >> >> [1] https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToDifferentLength.java > > This pull request has now been integrated. > > Changeset: 8b1b6f9f > Author: TatWai Chong > Committer: Nick Gasson > URL: https://git.openjdk.java.net/jdk/commit/8b1b6f9fb375bbc2de339ad8f526ca4d5f83dc70 > Stats: 517 lines in 11 files changed: 421 ins; 1 del; 95 mod > > 8269559: AArch64: Implement string_compare intrinsic in SVE > > Reviewed-by: ngasson, aph > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5129 > From ngasson at openjdk.java.net Thu Oct 14 06:28:55 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 14 Oct 2021 06:28:55 GMT Subject: RFR: 8269559: AArch64: Implement string_compare intrinsic in SVE [v4] In-Reply-To: References: Message-ID: On Thu, 14 Oct 2021 01:17:17 GMT, TatWai Chong wrote: >> This patch implements string_compare intrinsic in SVE. >> It supports all LL, LU, UL and UU comparisons. >> >> As we haven't found an existing benchmark to measure performance impact, >> we created a benchmark derived from the test [1] for this evaluation. >> This benchmark is attached to this patch. >> >> Besides, remove the unused temporary register `vtmp3` from the existing >> match rules for StrCmp. >> >> The result below shows all varients can be benefited largely. >> Command: make exploded-test TEST="micro:StringCompareToDifferentLength" >> >> Benchmark (size) Mode Cnt Score Speedup Units >> compareToLL 24 avgt 10 1.0x ms/op >> compareToLL 36 avgt 10 1.0x ms/op >> compareToLL 72 avgt 10 1.0x ms/op >> compareToLL 128 avgt 10 1.4x ms/op >> compareToLL 256 avgt 10 1.8x ms/op >> compareToLL 512 avgt 10 2.7x ms/op >> compareToLU 24 avgt 10 1.6x ms/op >> compareToLU 36 avgt 10 1.8x ms/op >> compareToLU 72 avgt 10 2.3x ms/op >> compareToLU 128 avgt 10 3.8x ms/op >> compareToLU 256 avgt 10 4.7x ms/op >> compareToLU 512 avgt 10 6.3x ms/op >> compareToUL 24 avgt 10 1.6x ms/op >> compareToUL 36 avgt 10 1.7x ms/op >> compareToUL 72 avgt 10 2.2x ms/op >> compareToUL 128 avgt 10 3.3x ms/op >> compareToUL 256 avgt 10 4.4x ms/op >> compareToUL 512 avgt 10 6.1x ms/op >> compareToUU 24 avgt 10 1.0x ms/op >> compareToUU 36 avgt 10 1.0x ms/op >> compareToUU 72 avgt 10 1.4x ms/op >> compareToUU 128 avgt 10 2.2x ms/op >> compareToUU 256 avgt 10 2.6x ms/op >> compareToUU 512 avgt 10 3.7x ms/op >> >> [1] https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToDifferentLength.java > > TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: > > Replace `sve_cmpne` with up-to-date `sve_cmp`. Hm, I didn't see anything like that when we tested this patch internally. I'll create another PR to revert it for now. ------------- PR: https://git.openjdk.java.net/jdk/pull/5129 From david.holmes at oracle.com Thu Oct 14 06:38:26 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 14 Oct 2021 16:38:26 +1000 Subject: RFR: 8269559: AArch64: Implement string_compare intrinsic in SVE [v4] In-Reply-To: References: Message-ID: On 14/10/2021 4:28 pm, Nick Gasson wrote: > On Thu, 14 Oct 2021 01:17:17 GMT, TatWai Chong wrote: > >>> This patch implements string_compare intrinsic in SVE. >>> It supports all LL, LU, UL and UU comparisons. >>> >>> As we haven't found an existing benchmark to measure performance impact, >>> we created a benchmark derived from the test [1] for this evaluation. >>> This benchmark is attached to this patch. >>> >>> Besides, remove the unused temporary register `vtmp3` from the existing >>> match rules for StrCmp. >>> >>> The result below shows all varients can be benefited largely. >>> Command: make exploded-test TEST="micro:StringCompareToDifferentLength" >>> >>> Benchmark (size) Mode Cnt Score Speedup Units >>> compareToLL 24 avgt 10 1.0x ms/op >>> compareToLL 36 avgt 10 1.0x ms/op >>> compareToLL 72 avgt 10 1.0x ms/op >>> compareToLL 128 avgt 10 1.4x ms/op >>> compareToLL 256 avgt 10 1.8x ms/op >>> compareToLL 512 avgt 10 2.7x ms/op >>> compareToLU 24 avgt 10 1.6x ms/op >>> compareToLU 36 avgt 10 1.8x ms/op >>> compareToLU 72 avgt 10 2.3x ms/op >>> compareToLU 128 avgt 10 3.8x ms/op >>> compareToLU 256 avgt 10 4.7x ms/op >>> compareToLU 512 avgt 10 6.3x ms/op >>> compareToUL 24 avgt 10 1.6x ms/op >>> compareToUL 36 avgt 10 1.7x ms/op >>> compareToUL 72 avgt 10 2.2x ms/op >>> compareToUL 128 avgt 10 3.3x ms/op >>> compareToUL 256 avgt 10 4.4x ms/op >>> compareToUL 512 avgt 10 6.1x ms/op >>> compareToUU 24 avgt 10 1.0x ms/op >>> compareToUU 36 avgt 10 1.0x ms/op >>> compareToUU 72 avgt 10 1.4x ms/op >>> compareToUU 128 avgt 10 2.2x ms/op >>> compareToUU 256 avgt 10 2.6x ms/op >>> compareToUU 512 avgt 10 3.7x ms/op >>> >>> [1] https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/intrinsics/string/TestStringCompareToDifferentLength.java >> >> TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace `sve_cmpne` with up-to-date `sve_cmp`. > > Hm, I didn't see anything like that when we tested this patch internally. I'll create another PR to revert it for now. Filed: https://bugs.openjdk.java.net/browse/JDK-8275263 If a backout is needed then it should be converted to a "backout" issue per: https://openjdk.java.net/guide/index.html#backing-out-a-change Thanks, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5129 > From ngasson at openjdk.java.net Thu Oct 14 06:55:48 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 14 Oct 2021 06:55:48 GMT Subject: RFR: 8275262: Backout JDK-8269559 In-Reply-To: <5AWjOqqXbV_kbpwSAY4vPwRKgHFrhmDe9IBn6q6y7KE=.cb114c0c-8559-4fae-8101-333fb24ac483@github.com> References: <5AWjOqqXbV_kbpwSAY4vPwRKgHFrhmDe9IBn6q6y7KE=.cb114c0c-8559-4fae-8101-333fb24ac483@github.com> Message-ID: On Thu, 14 Oct 2021 06:48:18 GMT, David Holmes wrote: >> This reverts "8269559: AArch64: Implement string_compare intrinsic in SVE" which caused some unknown failures in Oracle's CI. > > Thanks for attending to this so quickly @nick-arm ! > > The issue is only on (some of) our macOS Aarch64 systems. Let me know if I can provide more info on hardware etc. > > David Thanks @dholmes-ora . I'll wait for the GitHub Actions tests to finish. Should I retitle this to "[BACKOUT] AArch64: Implement string_compare intrinsic in SVE" as per the developer's guide? ------------- PR: https://git.openjdk.java.net/jdk/pull/5941 From david.holmes at oracle.com Thu Oct 14 07:06:29 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 14 Oct 2021 17:06:29 +1000 Subject: RFR: 8275262: Backout JDK-8269559 In-Reply-To: References: <5AWjOqqXbV_kbpwSAY4vPwRKgHFrhmDe9IBn6q6y7KE=.cb114c0c-8559-4fae-8101-333fb24ac483@github.com> Message-ID: <990d102e-91ba-5ce5-b365-390ded6f5655@oracle.com> On 14/10/2021 4:55 pm, Nick Gasson wrote: > On Thu, 14 Oct 2021 06:48:18 GMT, David Holmes wrote: > >>> This reverts "8269559: AArch64: Implement string_compare intrinsic in SVE" which caused some unknown failures in Oracle's CI. >> >> Thanks for attending to this so quickly @nick-arm ! >> >> The issue is only on (some of) our macOS Aarch64 systems. Let me know if I can provide more info on hardware etc. >> >> David > > Thanks @dholmes-ora . I'll wait for the GitHub Actions tests to finish. Should I retitle this to "[BACKOUT] AArch64: Implement string_compare intrinsic in SVE" as per the developer's guide? Yes please follow whatever the dev guide states to do. Thanks, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5941 > From tschatzl at openjdk.java.net Thu Oct 14 07:16:54 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 14 Oct 2021 07:16:54 GMT Subject: RFR: 8275262: [BACKOUT] AArch64: Implement string_compare intrinsic in SVE In-Reply-To: References: Message-ID: On Thu, 14 Oct 2021 06:37:19 GMT, Nick Gasson wrote: > This reverts "8269559: AArch64: Implement string_compare intrinsic in SVE" which caused some unknown failures in Oracle's CI. Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5941 From duke at openjdk.java.net Thu Oct 14 07:46:52 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Thu, 14 Oct 2021 07:46:52 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun Maybe the FAIL caused by `call_site_count = -1`, after the code is modified as follows, then test case passed --- a/src/hotspot/share/opto/callGenerator.cpp +++ b/src/hotspot/share/opto/callGenerator.cpp @@ -1049,7 +1049,8 @@ CallGenerator* CallGenerator::for_method_handle_call(JVMState* jvms, ciMethod* c ciCallProfile profile = caller->call_profile_at_bci(bci); int call_site_count = caller->scale_count(profile.count()); - if (IncrementalInlineMH && call_site_count > 0 && + if (IncrementalInlineMH && /*call_site_count > 0 &&*/ (input_not_const || !C->inlining_incrementally() || C->over_inlining_cutoff())) { return CallGenerator::for_mh_late_inline(caller, callee, input_not_const); } else { ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From aph at openjdk.java.net Thu Oct 14 08:27:48 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 14 Oct 2021 08:27:48 GMT Subject: RFR: 8275262: [BACKOUT] AArch64: Implement string_compare intrinsic in SVE In-Reply-To: <990d102e-91ba-5ce5-b365-390ded6f5655@oracle.com> References: <990d102e-91ba-5ce5-b365-390ded6f5655@oracle.com> Message-ID: On Thu, 14 Oct 2021 07:08:20 GMT, David Holmes wrote: > The issue is only on (some of) our macOS Aarch64 systems. Let me know if I can provide more info on hardware etc. Any info about what failed? A reproducer would be nice. ------------- PR: https://git.openjdk.java.net/jdk/pull/5941 From chagedorn at openjdk.java.net Thu Oct 14 08:35:54 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 14 Oct 2021 08:35:54 GMT Subject: RFR: 8268744: Improve sinking algorithm in partial peeling to avoid redundant clones In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 15:46:13 GMT, Christian Hagedorn wrote: > The algorithm in step 2 in partial peeling to move data nodes from the peel section to the non-peel section uses a straight forward cloning algorithm which creates redundant clones when the IR contains one ore more diamonds of data nodes to be cloned. The number of clones grows exponentially which could lead to a bailout (added by [JDK-8256934](https://bugs.openjdk.java.net/browse/JDK-8256934) for JDK 17 with a testcase). This RFE improves this algorithm to handle node diamonds more efficiently to avoid unnecessary cloning. The testcase for JDK-8256934 does not bail out anymore and uses only few clones. > > The main idea is to first find all outside of the loop uses `u` of the nodes in the initial peel region to be moved into the non-peel region. We then only need to clone any data node during the algorithm at most `u` times, once for each initial outside of the loop use. If we process a node diamond (following inputs), we can use an already cloned node for the top node of the diamond (node A in the example below). An example with 1 initial outside of the loop use and 4 nodes to be cloned, forming a diamond, is shown as comment in the code: > https://github.com/openjdk/jdk/blob/8ae0e1a06558a1678521dcb4ed32708a1821b47d/src/hotspot/share/opto/loopopts.cpp#L3605-L3635 > > The algorithm is explained in more details in the comments in the code (starting in method `move_nodes_to_not_peel()`). > > I also cleaned up the code for step 2 of partial peeling. I left the bailout code added by JDK-8256934 in place which I think is still required if we enter partial peeling with a huge number of live nodes (quite rare though). > > I additionally ran some standard benchmarks which did not show any improvements but also no regressions. I think it is rather an edge case where the old algorithm creates a huge number of redundant clones. Nevertheless, I think this improved algorithm is still worth to have to handle the more uncommon case of node diamonds. What do you think? > > Thanks, > Christian Hi Roland, thanks for taking a look at it. I have the feeling that I can use the same simpler fix as used in [JDK-8271954](https://bugs.openjdk.java.net/browse/JDK-8271954) which went in later. There, we had the very similar problem of sinking nodes out of a loop with cloning which could have a diamond shape. I will investigate further and then get back to you. ------------- PR: https://git.openjdk.java.net/jdk/pull/4923 From ngasson at openjdk.java.net Thu Oct 14 08:35:54 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 14 Oct 2021 08:35:54 GMT Subject: RFR: 8275262: [BACKOUT] AArch64: Implement string_compare intrinsic in SVE In-Reply-To: References: <990d102e-91ba-5ce5-b365-390ded6f5655@oracle.com> Message-ID: On Thu, 14 Oct 2021 08:24:41 GMT, Andrew Haley wrote: > > The issue is only on (some of) our macOS Aarch64 systems. Let me know if I can provide more info on hardware etc. > > Any info about what failed? A reproducer would be nice. I just ran tier1 on an M1 Mac with no failures - perhaps it's OS version dependent if it only failed on some systems? I have: $ uname -a Darwin [...] 20.2.0 Darwin Kernel Version 20.2.0: Wed Dec 2 20:40:21 PST 2020; root:xnu-7195.60.75~1/RELEASE_ARM64_T8101 arm64 ------------- PR: https://git.openjdk.java.net/jdk/pull/5941 From whuang at openjdk.java.net Thu Oct 14 09:32:26 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Thu, 14 Oct 2021 09:32:26 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v9] In-Reply-To: References: Message-ID: > * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. > * It may be a solver of JDK-8269866, or part of it. Wang Huang has updated the pull request incrementally with one additional commit since the last revision: sync asmtest.out.h ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4839/files - new: https://git.openjdk.java.net/jdk/pull/4839/files/e8e6f014..86277136 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=07-08 Stats: 84 lines in 1 file changed: 1 ins; 0 del; 83 mod Patch: https://git.openjdk.java.net/jdk/pull/4839.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4839/head:pull/4839 PR: https://git.openjdk.java.net/jdk/pull/4839 From aph at openjdk.java.net Thu Oct 14 09:33:51 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 14 Oct 2021 09:33:51 GMT Subject: RFR: 8275262: [BACKOUT] AArch64: Implement string_compare intrinsic in SVE In-Reply-To: References: Message-ID: On Thu, 14 Oct 2021 06:37:19 GMT, Nick Gasson wrote: > This reverts "8269559: AArch64: Implement string_compare intrinsic in SVE" which caused some unknown failures in Oracle's CI. It might be a spurious failure, then. I guess we need to see the test logs. ------------- PR: https://git.openjdk.java.net/jdk/pull/5941 From ngasson at openjdk.java.net Thu Oct 14 09:39:55 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 14 Oct 2021 09:39:55 GMT Subject: RFR: 8275262: [BACKOUT] AArch64: Implement string_compare intrinsic in SVE In-Reply-To: References: Message-ID: On Thu, 14 Oct 2021 09:31:11 GMT, Andrew Haley wrote: > It might be a spurious failure, then. I guess we need to see the test logs. I'll revert it for in now in case we missed something. ------------- PR: https://git.openjdk.java.net/jdk/pull/5941 From ngasson at openjdk.java.net Thu Oct 14 09:39:56 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 14 Oct 2021 09:39:56 GMT Subject: Integrated: 8275262: [BACKOUT] AArch64: Implement string_compare intrinsic in SVE In-Reply-To: References: Message-ID: On Thu, 14 Oct 2021 06:37:19 GMT, Nick Gasson wrote: > This reverts "8269559: AArch64: Implement string_compare intrinsic in SVE" which caused some unknown failures in Oracle's CI. This pull request has now been integrated. Changeset: 333c4692 Author: Nick Gasson URL: https://git.openjdk.java.net/jdk/commit/333c4692d898d582fe162cc9621acd3e1c242d67 Stats: 517 lines in 11 files changed: 1 ins; 421 del; 95 mod 8275262: [BACKOUT] AArch64: Implement string_compare intrinsic in SVE Reviewed-by: dholmes, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/5941 From aph at openjdk.java.net Thu Oct 14 09:49:48 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 14 Oct 2021 09:49:48 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Wed, 13 Oct 2021 00:00:22 GMT, Magnus Ihse Bursie wrote: > This patch expands the newly added system for hsdis backends to include LLVM. > > The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) > > Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. > > The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. So I figured out how to build it with `make build-hsdis`. The instructions need to be fixed. Two problems: firstly, the library seems to be built with the wrong name, so the runtime doesn't find it. I had to use `ln -sf ~/lib/libhsdis.dylib ~/lib/hsdis-aarch64.dylib` to get it to work. More severely, the disassembly is truncated, so lots of stuff doesn't work. The binutils-based hsdis prints 0x00000001105665b4: stp wzr, wzr, [x20, #60] ;*putfield preCounterBlock {reexecute=0 rethrow=0 return_oop=0} ; - com.sun.crypto.provider.GaloisCounterMode$GCMEngine::@80 (line 673) ; - com.sun.crypto.provider.GaloisCounterMode$GCMDecrypt::@8 (line 1346) ; - com.sun.crypto.provider.GaloisCounterMode::checkInit at 60 (line 329) ;; B6: # out( B242 B7 ) <- in( B192 B5 ) Freq: 0.999999 0x00000001105665b8: cbnz x21, 0x00000001105665c8 ;; null oop passed to encode_heap_oop_not_null2 0x00000001105665bc: dcps1 #0xdeae 0x00000001105665c0: .inst 0x082adca6 ; undefined 0x00000001105665c4: udf #1 0x00000001105665c8: lsr x10, x21, #3 ;*invokevirtual getLongUnaligned {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.invoke.VarHandleByteArrayAsLongs$ArrayHandle::get at 32 (line 115) ... lots more but the llvm-based one stops here: 0x000000010ed79034: stp wzr, wzr, [x20, #0x3c];*putfield preCounterBlock {reexecute=0 rethrow=0 return_oop=0} ; - com.sun.crypto.provider.GaloisCounterMode$GCMEngine::@80 (line 673) ; - com.sun.crypto.provider.GaloisCounterMode$GCMDecrypt::@8 (line 1346) ; - com.sun.crypto.provider.GaloisCounterMode::checkInit at 60 (line 329) ;; B6: # out( B242 B7 ) <- in( B192 B5 ) Freq: 0.999999 0x000000010ed79038: cbnz x21, #0x10 ;; null oop passed to encode_heap_oop_not_null2 0x000000010ed7903c: dcps1 #0xdeae 0x000000010ed79040: -------------------------------------------------------------------------------- [/Disassembly] I think it's giving up as soon as it sees something it doesn't recognize, so it's pretty much useless. In addition, even when it does work the LLVM disassembly is pretty poor. For example, the unverified entry point looks like 0x000000010ed78f80: ldr w8, [x1, #0x8] 0x000000010ed78f84: cmp w9, w8 0x000000010ed78f88: b.eq #0x10 0x000000010ed78f8c: adrp x8, #-128524288 ; {runtime_call ic_miss_stub} 0x000000010ed78f90: add x8, x8, #0x1c0 0x000000010ed78f94: br x8 instead of 0x0000000110566500: ldr w8, [x1, #8] 0x0000000110566504: cmp w9, w8 0x0000000110566508: b.eq 0x0000000110566518 // b.none 0x000000011056650c: adrp x8, 0x0000000108ae6000 ; {runtime_call ic_miss_stub} 0x0000000110566510: add x8, x8, #0x1c0 0x0000000110566514: br x8 Sure, it's good to have a choice, but the LLVM-based hsdis doesn't look to me like a serious alternative for professional work. ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From david.holmes at oracle.com Thu Oct 14 10:19:27 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 14 Oct 2021 20:19:27 +1000 Subject: RFR: 8275262: [BACKOUT] AArch64: Implement string_compare intrinsic in SVE In-Reply-To: References: <990d102e-91ba-5ce5-b365-390ded6f5655@oracle.com> Message-ID: <55e07be6-9915-2228-2ca2-974ae7459f01@oracle.com> On 14/10/2021 6:27 pm, Andrew Haley wrote: > On Thu, 14 Oct 2021 07:08:20 GMT, David Holmes wrote: > >> The issue is only on (some of) our macOS Aarch64 systems. Let me know if I can provide more info on hardware etc. > > Any info about what failed? A reproducer would be nice. See https://bugs.openjdk.java.net/browse/JDK-8275263 for some info on the failures. It seems really bizarre. David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5941 > From aph at openjdk.java.net Thu Oct 14 12:35:50 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 14 Oct 2021 12:35:50 GMT Subject: RFR: 8275262: [BACKOUT] AArch64: Implement string_compare intrinsic in SVE In-Reply-To: <55e07be6-9915-2228-2ca2-974ae7459f01@oracle.com> References: <55e07be6-9915-2228-2ca2-974ae7459f01@oracle.com> Message-ID: On Thu, 14 Oct 2021 10:21:20 GMT, David Holmes wrote: > See https://bugs.openjdk.java.net/browse/JDK-8275263 for some info on the failures. It seems really bizarre. I would understand this a lot better if the affected machines actually used SVE, but they don't have the hardware. That does reduce the bug surface: we only need to look at the affected common code, and the only thing I can immediately see is that a couple of unused register arguments have been added. Unless, of course, the affected Macs think they have SVE... ? I don't think that's possible. I'd definitely be looking at the host toolchain for differences. ------------- PR: https://git.openjdk.java.net/jdk/pull/5941 From ihse at openjdk.java.net Thu Oct 14 12:54:52 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Thu, 14 Oct 2021 12:54:52 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Thu, 14 Oct 2021 09:46:47 GMT, Andrew Haley wrote: >> This patch expands the newly added system for hsdis backends to include LLVM. >> >> The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) >> >> Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. >> >> The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. > > So I figured out how to build it with `make build-hsdis`. The instructions need to be fixed. > > Two problems: firstly, the library seems to be built with the wrong name, so the runtime doesn't find it. I had to use > `ln -sf ~/lib/libhsdis.dylib ~/lib/hsdis-aarch64.dylib` > to get it to work. > > More severely, the disassembly is truncated, so lots of stuff doesn't work. The binutils-based hsdis prints > > > 0x00000001105665b4: stp wzr, wzr, [x20, #60] ;*putfield preCounterBlock {reexecute=0 rethrow=0 return_oop=0} > ; - com.sun.crypto.provider.GaloisCounterMode$GCMEngine::@80 (line 673) > ; - com.sun.crypto.provider.GaloisCounterMode$GCMDecrypt::@8 (line 1346) > ; - com.sun.crypto.provider.GaloisCounterMode::checkInit at 60 (line 329) > ;; B6: # out( B242 B7 ) <- in( B192 B5 ) Freq: 0.999999 > 0x00000001105665b8: cbnz x21, 0x00000001105665c8 > ;; null oop passed to encode_heap_oop_not_null2 > 0x00000001105665bc: dcps1 #0xdeae > 0x00000001105665c0: .inst 0x082adca6 ; undefined > 0x00000001105665c4: udf #1 > 0x00000001105665c8: lsr x10, x21, #3 ;*invokevirtual getLongUnaligned {reexecute=0 rethrow=0 return_oop=0} > ; - java.lang.invoke.VarHandleByteArrayAsLongs$ArrayHandle::get at 32 (line 115) > ... lots more > > > but the llvm-based one stops here: > > > 0x000000010ed79034: stp wzr, wzr, [x20, #0x3c];*putfield preCounterBlock {reexecute=0 rethrow=0 return_oop=0} > ; - com.sun.crypto.provider.GaloisCounterMode$GCMEngine::@80 (line 673) > ; - com.sun.crypto.provider.GaloisCounterMode$GCMDecrypt::@8 (line 1346) > ; - com.sun.crypto.provider.GaloisCounterMode::checkInit at 60 (line 329) > ;; B6: # out( B242 B7 ) <- in( B192 B5 ) Freq: 0.999999 > 0x000000010ed79038: cbnz x21, #0x10 > ;; null oop passed to encode_heap_oop_not_null2 > 0x000000010ed7903c: dcps1 #0xdeae > 0x000000010ed79040: -------------------------------------------------------------------------------- > [/Disassembly] > > > > I think it's giving up as soon as it sees something it doesn't recognize, so it's pretty much useless. > > In addition, even when it does work the LLVM disassembly is pretty poor. For example, the unverified entry point looks like > > > 0x000000010ed78f80: ldr w8, [x1, #0x8] > 0x000000010ed78f84: cmp w9, w8 > 0x000000010ed78f88: b.eq #0x10 > 0x000000010ed78f8c: adrp x8, #-128524288 ; {runtime_call ic_miss_stub} > 0x000000010ed78f90: add x8, x8, #0x1c0 > 0x000000010ed78f94: br x8 > > > instead of > > > 0x0000000110566500: ldr w8, [x1, #8] > 0x0000000110566504: cmp w9, w8 > 0x0000000110566508: b.eq 0x0000000110566518 // b.none > 0x000000011056650c: adrp x8, 0x0000000108ae6000 ; {runtime_call ic_miss_stub} > 0x0000000110566510: add x8, x8, #0x1c0 > 0x0000000110566514: br x8 > > > Sure, it's good to have a choice, but the LLVM-based hsdis doesn't look to me like a serious alternative for professional work. @theRealAph Yes, the build instructions needs to be updated. I could have sworn I did that, but maybe that fix is just lying around un-committed in one of my repos on one of my test computers. I'll have to go around and check. Or write it again if I can't fix it. That needs to go with this PR. You can do `make install-hsdis` to avoid doing the symlinking yourself. As Jorn pointed out, there is currently a bug in that the library does not get copied from the "exploded jdk" to the image. I will fix that in an upcoming commit to this PR. As for LLVM not giving you a good user experience; I can't really tell what's wrong. Apparently @luhenry (and @JornVernee I believe) has used this. I'm not really the target audience myself; I'm only trying to make it possible to use. If it is so severly limited as you say maybe this isn't worth pursuing. Some feedback from those who have tested it would be appeciated here, to help med understand if this patch should be dropped. ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From chagedorn at openjdk.java.net Thu Oct 14 13:42:50 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 14 Oct 2021 13:42:50 GMT Subject: RFR: 8262912: ciReplay: replay does not simulate unresolved classes In-Reply-To: <1_zhAUvJidCFwD_VCEakIEPf0jpaXFyQtcF1t5XvCxE=.0bfa6362-254d-463d-8359-8f9dc851214e@github.com> References: <1_zhAUvJidCFwD_VCEakIEPf0jpaXFyQtcF1t5XvCxE=.0bfa6362-254d-463d-8359-8f9dc851214e@github.com> Message-ID: <61MVVuCtLN6hzqa0xrZp4bgwadEOLgujgsTr3BzW4IM=.f0b5ec21-f44e-45fd-aea5-378f61dec894@github.com> On Wed, 13 Oct 2021 09:21:18 GMT, Christian Hagedorn wrote: > When trying to replay compile, the JVM will always resolve some classes before actually doing the replay compilation. When finally replay compiling the method, the state of `ciInstanceKlasses` which are resolved/unresolved could be different compared to the state at which the replay file was dumped. This will even be a bigger problem when tackling [JDK-8254110](https://bugs.openjdk.java.net/browse/JDK-8254110). > > This change intends to fix this by only treating a `ciInstanceKlass` as *not* unresolved if there is a corresponding entry for it in the replay file. This is achieved by a whitelist (`ciInstanceKlassRecord`). All accesses to get a pointer to a `ciInstanceKlass` are eventually routed through `ciEnv::get_metadata()`. This method is hooked to compare it against the replay compilation whitelist. If the corresponding `Klass` is not on the list, an unresolved `ciInstanceKlass` is returned instead. > > Finding a way to reliably test this feature was difficult. I therefore came up with a test which first creates a replay file with `CICrashAt` and then removes the `ciInstanceKlass` entry for class `Foo` to simulate that `Foo` was unresolved at replay dump time. This will result in a different C2 IR which is verified by checking the `PrintIdeal` output (see comments in test). > > Thanks, > Christian Thanks Vladimir and Dean for your reviews! @dean-long: Should we already do some of these changes now or should we move forward and eventually fix these in JDK-8254110? ------------- PR: https://git.openjdk.java.net/jdk/pull/5926 From aph at openjdk.java.net Thu Oct 14 14:06:54 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 14 Oct 2021 14:06:54 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Wed, 13 Oct 2021 07:26:21 GMT, Ludovic Henry wrote: >> This patch expands the newly added system for hsdis backends to include LLVM. >> >> The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) >> >> Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. >> >> The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. > > Very happy to see it landing :) Thank you! > > I don't have access to a windows machine, and even less a Windows-AArch64 machine. @lewurm would you be able to take a look? > As for LLVM not giving you a good user experience; I can't really tell what's wrong. Apparently @luhenry (and @JornVernee I believe) has used this. I'm not really the target audience myself; I'm only trying to make it possible to use. If it is so severly limited as you say maybe this isn't worth pursuing. Some feedback from those who have tested it would be appeciated here, to help med understand if this patch should be dropped. I don't think it should be dropped, but I imagine that the bugs can be fixed. If LLVM's disassembler always dies as soon as it sees something it can't recognize, I'm astonished. Maybe the LLVM I'm using is bad. ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From jvernee at openjdk.java.net Thu Oct 14 14:17:51 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Thu, 14 Oct 2021 14:17:51 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: <2mM80GN7cgb99W91FnCDP13DGr4LUPXCr0iRdO-JqQQ=.25d0f1ce-b4b8-4e2d-811d-c9268c500bf2@github.com> On Thu, 14 Oct 2021 14:04:12 GMT, Andrew Haley wrote: > Some feedback from those who have tested it would be appeciated here I haven't really tested it beyond building the lib and seeing if assembly was output instead of just bytes, so I can't really comment on that I'm afraid. Since the binutils hsdis wasn't buildable on Windows for me in the past, I've always been using the [fcml based hsdis](https://github.com/swojtasiak/fcml-lib/tree/master/example/hsdis) on Windows. ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From duke at openjdk.java.net Thu Oct 14 19:48:57 2021 From: duke at openjdk.java.net (duke) Date: Thu, 14 Oct 2021 19:48:57 GMT Subject: Withdrawn: 8267928: Loop predicate gets inexact loop limit before PhaseIdealLoop::rc_predicate In-Reply-To: References: Message-ID: On Fri, 28 May 2021 13:29:36 GMT, Yi Yang wrote: > Loop predicate gets inexact loop limit(LoopLimitNode) from exact_limit(even if the limit is statically known) and does unnecessary overflow checking when generating lower bound test(rc_predicate). The reason is rather straightforward: exact_limit fails to see a HasExactTripCount flag since it would be set after performing loop predicate(iteration_split). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4247 From dlong at openjdk.java.net Thu Oct 14 23:28:51 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 14 Oct 2021 23:28:51 GMT Subject: RFR: 8262912: ciReplay: replay does not simulate unresolved classes In-Reply-To: <61MVVuCtLN6hzqa0xrZp4bgwadEOLgujgsTr3BzW4IM=.f0b5ec21-f44e-45fd-aea5-378f61dec894@github.com> References: <1_zhAUvJidCFwD_VCEakIEPf0jpaXFyQtcF1t5XvCxE=.0bfa6362-254d-463d-8359-8f9dc851214e@github.com> <61MVVuCtLN6hzqa0xrZp4bgwadEOLgujgsTr3BzW4IM=.f0b5ec21-f44e-45fd-aea5-378f61dec894@github.com> Message-ID: On Thu, 14 Oct 2021 13:40:02 GMT, Christian Hagedorn wrote: >> When trying to replay compile, the JVM will always resolve some classes before actually doing the replay compilation. When finally replay compiling the method, the state of `ciInstanceKlasses` which are resolved/unresolved could be different compared to the state at which the replay file was dumped. This will even be a bigger problem when tackling [JDK-8254110](https://bugs.openjdk.java.net/browse/JDK-8254110). >> >> This change intends to fix this by only treating a `ciInstanceKlass` as *not* unresolved if there is a corresponding entry for it in the replay file. This is achieved by a whitelist (`ciInstanceKlassRecord`). All accesses to get a pointer to a `ciInstanceKlass` are eventually routed through `ciEnv::get_metadata()`. This method is hooked to compare it against the replay compilation whitelist. If the corresponding `Klass` is not on the list, an unresolved `ciInstanceKlass` is returned instead. >> >> Finding a way to reliably test this feature was difficult. I therefore came up with a test which first creates a replay file with `CICrashAt` and then removes the `ciInstanceKlass` entry for class `Foo` to simulate that `Foo` was unresolved at replay dump time. This will result in a different C2 IR which is verified by checking the `PrintIdeal` output (see comments in test). >> >> Thanks, >> Christian > > Thanks Vladimir and Dean for your reviews! > > @dean-long: Should we already do some of these changes now or should we move forward and eventually fix these in JDK-8254110? @chhagedorn I suggest pushing what you have now and fixing the leaks in JDK-8254110. ------------- PR: https://git.openjdk.java.net/jdk/pull/5926 From duke at openjdk.java.net Fri Oct 15 01:14:45 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Fri, 15 Oct 2021 01:14:45 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun The following modifications are also OK,But I'm not sure it's a reasonable fix diff --git a/src/hotspot/share/ci/ciMethod.cpp b/src/hotspot/share/ci/ciMethod.cpp index 862824c5b72..ff2c8ecb8e8 100644 --- a/src/hotspot/share/ci/ciMethod.cpp +++ b/src/hotspot/share/ci/ciMethod.cpp @@ -458,7 +458,8 @@ int ciMethod::check_overflow(int c, Bytecodes::Code code) { ciCallProfile ciMethod::call_profile_at_bci(int bci) { ResourceMark rm; ciCallProfile result; - if (method_data() != NULL && method_data()->is_mature()) { + if (ensure_method_data()){ ciProfileData* data = method_data()->bci_to_data(bci); if (data != NULL && data->is_CounterData()) { // Every profiled call site has a counter. ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From thartmann at openjdk.java.net Fri Oct 15 06:32:54 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 15 Oct 2021 06:32:54 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: <75VqFsOjKZUUeO9VcB8-OWjS_0UkLZIdpjK-htvCtgI=.5a24a609-d9e2-48a9-a8ef-c9588c134622@github.com> On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun Thanks for the details! I think it would be good if @iwanowww could take a look. He implemented post-parse call devirtualization. ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From chagedorn at openjdk.java.net Fri Oct 15 07:03:48 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 15 Oct 2021 07:03:48 GMT Subject: RFR: 8267928: Loop predicate gets inexact loop limit before PhaseIdealLoop::rc_predicate In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 10:51:08 GMT, Christian Hagedorn wrote: > Maybe you can hold off integrating this change until a decision is made and then change the location of your IR test accordingly. The location for IR tests is `compiler/c2/irTests` for the time being. ------------- PR: https://git.openjdk.java.net/jdk/pull/4247 From chagedorn at openjdk.java.net Fri Oct 15 07:41:59 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 15 Oct 2021 07:41:59 GMT Subject: RFR: 8262912: ciReplay: replay does not simulate unresolved classes In-Reply-To: <61MVVuCtLN6hzqa0xrZp4bgwadEOLgujgsTr3BzW4IM=.f0b5ec21-f44e-45fd-aea5-378f61dec894@github.com> References: <1_zhAUvJidCFwD_VCEakIEPf0jpaXFyQtcF1t5XvCxE=.0bfa6362-254d-463d-8359-8f9dc851214e@github.com> <61MVVuCtLN6hzqa0xrZp4bgwadEOLgujgsTr3BzW4IM=.f0b5ec21-f44e-45fd-aea5-378f61dec894@github.com> Message-ID: <-OLQQUY-SuHUnUhMA8TQyucNJ5G3fCgNYuOTq1le0BY=.bfc19e31-c4af-483d-a062-3683f933e56f@github.com> On Thu, 14 Oct 2021 13:40:02 GMT, Christian Hagedorn wrote: >> When trying to replay compile, the JVM will always resolve some classes before actually doing the replay compilation. When finally replay compiling the method, the state of `ciInstanceKlasses` which are resolved/unresolved could be different compared to the state at which the replay file was dumped. This will even be a bigger problem when tackling [JDK-8254110](https://bugs.openjdk.java.net/browse/JDK-8254110). >> >> This change intends to fix this by only treating a `ciInstanceKlass` as *not* unresolved if there is a corresponding entry for it in the replay file. This is achieved by a whitelist (`ciInstanceKlassRecord`). All accesses to get a pointer to a `ciInstanceKlass` are eventually routed through `ciEnv::get_metadata()`. This method is hooked to compare it against the replay compilation whitelist. If the corresponding `Klass` is not on the list, an unresolved `ciInstanceKlass` is returned instead. >> >> Finding a way to reliably test this feature was difficult. I therefore came up with a test which first creates a replay file with `CICrashAt` and then removes the `ciInstanceKlass` entry for class `Foo` to simulate that `Foo` was unresolved at replay dump time. This will result in a different C2 IR which is verified by checking the `PrintIdeal` output (see comments in test). >> >> Thanks, >> Christian > > Thanks Vladimir and Dean for your reviews! > > @dean-long: Should we already do some of these changes now or should we move forward and eventually fix these in JDK-8254110? > @chhagedorn I suggest pushing what you have now and fixing the leaks in JDK-8254110. Okay, thanks for your assessment! I will add these comments to JDK-8254110 to keep track of them. ------------- PR: https://git.openjdk.java.net/jdk/pull/5926 From chagedorn at openjdk.java.net Fri Oct 15 07:42:01 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 15 Oct 2021 07:42:01 GMT Subject: Integrated: 8262912: ciReplay: replay does not simulate unresolved classes In-Reply-To: <1_zhAUvJidCFwD_VCEakIEPf0jpaXFyQtcF1t5XvCxE=.0bfa6362-254d-463d-8359-8f9dc851214e@github.com> References: <1_zhAUvJidCFwD_VCEakIEPf0jpaXFyQtcF1t5XvCxE=.0bfa6362-254d-463d-8359-8f9dc851214e@github.com> Message-ID: On Wed, 13 Oct 2021 09:21:18 GMT, Christian Hagedorn wrote: > When trying to replay compile, the JVM will always resolve some classes before actually doing the replay compilation. When finally replay compiling the method, the state of `ciInstanceKlasses` which are resolved/unresolved could be different compared to the state at which the replay file was dumped. This will even be a bigger problem when tackling [JDK-8254110](https://bugs.openjdk.java.net/browse/JDK-8254110). > > This change intends to fix this by only treating a `ciInstanceKlass` as *not* unresolved if there is a corresponding entry for it in the replay file. This is achieved by a whitelist (`ciInstanceKlassRecord`). All accesses to get a pointer to a `ciInstanceKlass` are eventually routed through `ciEnv::get_metadata()`. This method is hooked to compare it against the replay compilation whitelist. If the corresponding `Klass` is not on the list, an unresolved `ciInstanceKlass` is returned instead. > > Finding a way to reliably test this feature was difficult. I therefore came up with a test which first creates a replay file with `CICrashAt` and then removes the `ciInstanceKlass` entry for class `Foo` to simulate that `Foo` was unresolved at replay dump time. This will result in a different C2 IR which is verified by checking the `PrintIdeal` output (see comments in test). > > Thanks, > Christian This pull request has now been integrated. Changeset: 4cb7124c Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/4cb7124c1e9c5fd1d3a82fd8933cc63fefde9531 Stats: 241 lines in 8 files changed: 226 ins; 0 del; 15 mod 8262912: ciReplay: replay does not simulate unresolved classes Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/5926 From duke at openjdk.java.net Fri Oct 15 08:02:51 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Fri, 15 Oct 2021 08:02:51 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() In-Reply-To: References: Message-ID: On Sat, 4 Sep 2021 02:58:56 GMT, SUN Guoyun wrote: > Hi all, > > When I implement a new instruct in adfile for match CMoveP with Cmp node,like this: > > match(Set dst (CMoveP (Binary cop (CmpP op1 zero)) (Binary dst zero))); > > this means right child of CmpP is immediate zero and right child of CmovP also is immediate zero, then an exception will occur: > >

> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x000000fff410fcc4, pid=11130, tid=11146
> #
> # JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137)
> # Java VM: OpenJDK 64-Bit Server VM (17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-loongarch64)
> # Problematic frame:
> # V [libjvm.so+0x21fcc4] cmovP_cmpP_zero_zeroNode::bottom_type() const+0x44
> #
> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please visit:
> # https://bugreport.java.com/bugreport/crash.jsp
> #
> 
> > In this case, cmovP_ cmpP_ zero_ zeroNode only has three input nodes, so an exception is triggered. This is a patch to fix this problem. Please help review it > > Thanks, > Sun Guoyun What else do I need to do for this patch? ------------- PR: https://git.openjdk.java.net/jdk/pull/5369 From luhenry at openjdk.java.net Fri Oct 15 08:22:48 2021 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Fri, 15 Oct 2021 08:22:48 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Wed, 13 Oct 2021 00:00:22 GMT, Magnus Ihse Bursie wrote: > This patch expands the newly added system for hsdis backends to include LLVM. > > The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) > > Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. > > The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. The value add of this LLVM-based hsdis is two-fold: - It supports platforms that aren't supported by binutils (Windows-AArch64 for example) - The license being more permissive would allow to build it as part of the OpenJDK build more easily (and even maybe ship it?) LLVM has a strong track record of supporting new platforms (Windows-AArch64 and macOS-AArch64 for example, mostly because of investment from Microsoft and Apple respectively), and `hsdis` is a necessary tool for porting the OpenJDK to any new platform. Since the maintenance is fairly low (small codebase, small and knowledgable user base), I would be biased towards including it with appropriate warnings. ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From jbhateja at openjdk.java.net Fri Oct 15 12:51:59 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 15 Oct 2021 12:51:59 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target Message-ID: Hi All, This patch optimizes macro assembly routines used by fill stubs of various primitive types for X86 AVX-512 target. Following are the main changes:- 1) Specialized instruction sequence for fill operation over various block sizes. 2) Control flow is sensitive to AVX3Threshold and generated code operates over 32 byte vector (YMM) if block size is less than threshold else instructions operate of 64 byte vector (ZMM). 3) Bulk fill operation is performed by a destination aligned fill loop with appropriate unroll factor, this avoids any cache line split penalty and improves performance. 4) Currently fill patterns are vectorized by auto-vectorizer and generated code operates over vectors of MaxVectorSize, in addition auto-vectorizer is oblivious to AVX3Thresholds and this may result into performance degradation over prior generation of X86 servers where 64 byte vector stores using ZMM registers operates at reduced CPU frequency. Patch enables JVM runtime flag -XX:+OptimizedFill ON by default for X86 target supporting AVX-512 feature. 5) Patch also optimizes the mask generation sequence of fill* macro assembly routines using BZHI instruction. Performance measurements of an existing JMH micro over Icelake server shows ~1.1-4.0X gains for fill operation with varying block sizes. Following are detailed results: System Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S) Benchmark | Size | Baseline Auto-vectorized -XX:-OptimizeFill (ops/ms) | New Optimized Fill AVX3 Th=4096 (ops/ms) | Gain Factor (OptFill AVX3Th=4096/Baseline) -- | -- | -- | -- | -- ArrayFill.testByteFill | 16 | 193994.942 | 381142.844 | 1.964705059 ArrayFill.testByteFill | 31 | 99817.403 | 399973.74 | 4.007054161 ArrayFill.testByteFill | 59 | 80759.378 | 342165.394 | 4.236850289 ArrayFill.testByteFill | 89 | 127342.997 | 341696.357 | 2.683275603 ArrayFill.testByteFill | 126 | 72081.809 | 309335.351 | 4.291448221 ArrayFill.testByteFill | 250 | 41419.435 | 166618.264 | 4.022707311 ArrayFill.testByteFill | 511 | 32509.962 | 138595.951 | 4.263184036 ArrayFill.testByteFill | 1021 | 35930.96 | 90622.597 | 2.522131248 ArrayFill.testByteFill | 2047 | 32956.62 | 67252.442 | 2.040635296 ArrayFill.testByteFill | 4095 | 29180.81 | 45508.86 | 1.559547525 ArrayFill.testByteFill | 8195 | 17468.775 | 25072.671 | 1.435285016 ArrayFill.testByteFill | 65536 | 978.482 | 946.377 | 0.967188972 ArrayFill.testCharFill | 16 | 205893.99 | 381151.485 | 1.851202578 ArrayFill.testCharFill | 31 | 90418.278 | 385694.751 | 4.265672379 ArrayFill.testCharFill | 59 | 117391.45 | 310132.477 | 2.641865971 ArrayFill.testCharFill | 89 | 117956.135 | 202314.017 | 1.715163158 ArrayFill.testCharFill | 126 | 70174.917 | 164571.761 | 2.345165025 ArrayFill.testCharFill | 250 | 37243.255 | 141460.648 | 3.798289059 ArrayFill.testCharFill | 511 | 33788.369 | 98578.472 | 2.917526797 ArrayFill.testCharFill | 1021 | 33655.897 | 78305.288 | 2.326643916 ArrayFill.testCharFill | 2047 | 35656.759 | 41973.205 | 1.177145825 ArrayFill.testCharFill | 4095 | 16311.779 | 24724.413 | 1.515739822 ArrayFill.testCharFill | 8195 | 11412.845 | 12599.1 | 1.103940341 ArrayFill.testCharFill | 65536 | 476.138 | 486.723 | 1.02223095 ArrayFill.testDoubleFill | 16 | 222222.265 | 193741.026 | 0.871834449 ArrayFill.testDoubleFill | 31 | 169693.273 | 155377.031 | 0.915634593 ArrayFill.testDoubleFill | 59 | 101838.606 | 197496.671 | 1.939310432 ArrayFill.testDoubleFill | 89 | 106202.786 | 182813.717 | 1.721364607 ArrayFill.testDoubleFill | 126 | 128696.666 | 123066.432 | 0.956251905 ArrayFill.testDoubleFill | 250 | 81145.924 | 90895.167 | 1.120144581 ArrayFill.testDoubleFill | 511 | 44615.14 | 48668.332 | 1.090847905 ArrayFill.testDoubleFill | 1021 | 25191.332 | 25152.377 | 0.998453635 ArrayFill.testDoubleFill | 2047 | 11337.929 | 12655.112 | 1.11617492 ArrayFill.testDoubleFill | 4095 | 6378.326 | 6378.392 | 1.000010348 ArrayFill.testDoubleFill | 8195 | 885.269 | 882.644 | 0.9970348 ArrayFill.testDoubleFill | 65536 | 121.155 | 121.252 | 1.000800627 ArrayFill.testFloatFill | 16 | 201801.067 | 342214.071 | 1.695799116 ArrayFill.testFloatFill | 31 | 93851.962 | 322681.433 | 3.438195922 ArrayFill.testFloatFill | 59 | 107454.704 | 162266.325 | 1.510090475 ArrayFill.testFloatFill | 89 | 129597.511 | 158890.265 | 1.226028677 ArrayFill.testFloatFill | 126 | 92358.492 | 151423.881 | 1.639523099 ArrayFill.testFloatFill | 250 | 95412.586 | 96269.997 | 1.008986351 ArrayFill.testFloatFill | 511 | 68356.016 | 73395.512 | 1.07372425 ArrayFill.testFloatFill | 1021 | 46040.879 | 42767.414 | 0.928900901 ArrayFill.testFloatFill | 2047 | 23876.684 | 24988.836 | 1.046578997 ArrayFill.testFloatFill | 4095 | 12475.923 | 12598.467 | 1.00982244 ArrayFill.testFloatFill | 8195 | 6286.263 | 6292.858 | 1.001049113 ArrayFill.testFloatFill | 65536 | 230.041 | 248.095 | 1.078481662 ArrayFill.testIntFill | 16 | 188215.196 | 339491.214 | 1.803739662 ArrayFill.testIntFill | 31 | 146425.028 | 321621.325 | 2.19649147 ArrayFill.testIntFill | 59 | 140650.413 | 194907.815 | 1.385760702 ArrayFill.testIntFill | 89 | 78017.244 | 166579.365 | 2.13516085 ArrayFill.testIntFill | 126 | 97645.936 | 142150.475 | 1.455774616 ArrayFill.testIntFill | 250 | 68623.478 | 96538.532 | 1.406785765 ArrayFill.testIntFill | 511 | 57465.869 | 84218.747 | 1.465543782 ArrayFill.testIntFill | 1021 | 46308.298 | 45287.255 | 0.977951187 ArrayFill.testIntFill | 2047 | 24222.479 | 25017.366 | 1.032816088 ArrayFill.testIntFill | 4095 | 12470.853 | 12656.69 | 1.014901707 ArrayFill.testIntFill | 8195 | 6302.584 | 6312.377 | 1.001553807 ArrayFill.testIntFill | 65536 | 227.098 | 248.39 | 1.09375688 ArrayFill.testLongFill | 16 | 229400.195 | 190876.891 | 0.832069437 ArrayFill.testLongFill | 31 | 160433.763 | 161062.288 | 1.00391766 ArrayFill.testLongFill | 59 | 117527.007 | 104990.932 | 0.893334517 ArrayFill.testLongFill | 89 | 106400.533 | 112155.423 | 1.054087041 ArrayFill.testLongFill | 126 | 133428.366 | 141422.605 | 1.059914089 ArrayFill.testLongFill | 250 | 83393.535 | 70419.357 | 0.844422256 ArrayFill.testLongFill | 511 | 48534.407 | 44830.441 | 0.923683708 ArrayFill.testLongFill | 1021 | 25150.503 | 25144.854 | 0.999775392 ArrayFill.testLongFill | 2047 | 12661.581 | 12495.112 | 0.986852432 ArrayFill.testLongFill | 4095 | 6378.589 | 6326.361 | 0.991811982 ArrayFill.testLongFill | 8195 | 884.108 | 883.225 | 0.999001253 ArrayFill.testLongFill | 65536 | 116.544 | 115.809 | 0.993693369 ArrayFill.testShortFill | 16 | 181717.691 | 381160.843 | 2.097543948 ArrayFill.testShortFill | 31 | 99246.669 | 376006.724 | 3.788607999 ArrayFill.testShortFill | 59 | 125435.022 | 308756.585 | 2.461486275 ArrayFill.testShortFill | 89 | 116796.477 | 195568.654 | 1.674439667 ArrayFill.testShortFill | 126 | 37346.482 | 164389.009 | 4.401726754 ArrayFill.testShortFill | 250 | 32537.347 | 140808.889 | 4.327608179 ArrayFill.testShortFill | 511 | 43932.519 | 103200.042 | 2.349058154 ArrayFill.testShortFill | 1021 | 42808.585 | 80777.289 | 1.886941346 ArrayFill.testShortFill | 2047 | 34852.049 | 41482.517 | 1.190246146 ArrayFill.testShortFill | 4095 | 21427.935 | 24971.245 | 1.165359378 ArrayFill.testShortFill | 8195 | 11666.17 | 12655.972 | 1.084843783 ArrayFill.testShortFill | 65536 | 451.299 | 486.96 | 1.079018566 Kindly review and share your feedbak. Best Regards, Jatin ------------- Commit messages: - 8275047: Optimize existing fill stubs for AVX-512 target Changes: https://git.openjdk.java.net/jdk/pull/5967/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5967&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275047 Stats: 266 lines in 5 files changed: 221 ins; 33 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/5967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5967/head:pull/5967 PR: https://git.openjdk.java.net/jdk/pull/5967 From redestad at openjdk.java.net Fri Oct 15 13:24:53 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 15 Oct 2021 13:24:53 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target In-Reply-To: References: Message-ID: On Fri, 15 Oct 2021 12:43:33 GMT, Jatin Bhateja wrote: > Hi All, > > This patch optimizes macro assembly routines used by fill stubs of various primitive types for X86 AVX-512 target. > Following are the main changes:- > 1) Specialized instruction sequence for fill operation over various block sizes. > 2) Control flow is sensitive to AVX3Threshold and generated code operates over 32 byte vector (YMM) if > block size is less than threshold else instructions operate of 64 byte vector (ZMM). > 3) Bulk fill operation is performed by a destination aligned fill loop with appropriate unroll factor, this > avoids any cache line split penalty and improves performance. > 4) Currently fill patterns are vectorized by auto-vectorizer and generated code operates over vectors > of MaxVectorSize, in addition auto-vectorizer is oblivious to AVX3Thresholds and this may result into > performance degradation over prior generation of X86 servers where 64 byte vector stores using ZMM > registers operates at reduced CPU frequency. > Patch enables JVM runtime flag -XX:+OptimizedFill ON by default for X86 target supporting AVX-512 feature. > 5) Patch also optimizes the mask generation sequence of fill* macro assembly routines using BZHI instruction. > > Performance measurements of an existing JMH micro over Icelake server shows ~1.1-4.0X gains for fill operation with varying block sizes. > > Following are detailed results: > > System Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S) > > Benchmark | Size | Baseline Auto-vectorized -XX:-OptimizeFill (ops/ms) | New Optimized Fill AVX3 Th=4096 (ops/ms) | Gain Factor (OptFill AVX3Th=4096/Baseline) > -- | -- | -- | -- | -- > ArrayFill.testByteFill | 16 | 193994.942 | 381142.844 | 1.964705059 > ArrayFill.testByteFill | 31 | 99817.403 | 399973.74 | 4.007054161 > ArrayFill.testByteFill | 59 | 80759.378 | 342165.394 | 4.236850289 > ArrayFill.testByteFill | 89 | 127342.997 | 341696.357 | 2.683275603 > ArrayFill.testByteFill | 126 | 72081.809 | 309335.351 | 4.291448221 > ArrayFill.testByteFill | 250 | 41419.435 | 166618.264 | 4.022707311 > ArrayFill.testByteFill | 511 | 32509.962 | 138595.951 | 4.263184036 > ArrayFill.testByteFill | 1021 | 35930.96 | 90622.597 | 2.522131248 > ArrayFill.testByteFill | 2047 | 32956.62 | 67252.442 | 2.040635296 > ArrayFill.testByteFill | 4095 | 29180.81 | 45508.86 | 1.559547525 > ArrayFill.testByteFill | 8195 | 17468.775 | 25072.671 | 1.435285016 > ArrayFill.testByteFill | 65536 | 978.482 | 946.377 | 0.967188972 > ArrayFill.testCharFill | 16 | 205893.99 | 381151.485 | 1.851202578 > ArrayFill.testCharFill | 31 | 90418.278 | 385694.751 | 4.265672379 > ArrayFill.testCharFill | 59 | 117391.45 | 310132.477 | 2.641865971 > ArrayFill.testCharFill | 89 | 117956.135 | 202314.017 | 1.715163158 > ArrayFill.testCharFill | 126 | 70174.917 | 164571.761 | 2.345165025 > ArrayFill.testCharFill | 250 | 37243.255 | 141460.648 | 3.798289059 > ArrayFill.testCharFill | 511 | 33788.369 | 98578.472 | 2.917526797 > ArrayFill.testCharFill | 1021 | 33655.897 | 78305.288 | 2.326643916 > ArrayFill.testCharFill | 2047 | 35656.759 | 41973.205 | 1.177145825 > ArrayFill.testCharFill | 4095 | 16311.779 | 24724.413 | 1.515739822 > ArrayFill.testCharFill | 8195 | 11412.845 | 12599.1 | 1.103940341 > ArrayFill.testCharFill | 65536 | 476.138 | 486.723 | 1.02223095 > ArrayFill.testDoubleFill | 16 | 222222.265 | 193741.026 | 0.871834449 > ArrayFill.testDoubleFill | 31 | 169693.273 | 155377.031 | 0.915634593 > ArrayFill.testDoubleFill | 59 | 101838.606 | 197496.671 | 1.939310432 > ArrayFill.testDoubleFill | 89 | 106202.786 | 182813.717 | 1.721364607 > ArrayFill.testDoubleFill | 126 | 128696.666 | 123066.432 | 0.956251905 > ArrayFill.testDoubleFill | 250 | 81145.924 | 90895.167 | 1.120144581 > ArrayFill.testDoubleFill | 511 | 44615.14 | 48668.332 | 1.090847905 > ArrayFill.testDoubleFill | 1021 | 25191.332 | 25152.377 | 0.998453635 > ArrayFill.testDoubleFill | 2047 | 11337.929 | 12655.112 | 1.11617492 > ArrayFill.testDoubleFill | 4095 | 6378.326 | 6378.392 | 1.000010348 > ArrayFill.testDoubleFill | 8195 | 885.269 | 882.644 | 0.9970348 > ArrayFill.testDoubleFill | 65536 | 121.155 | 121.252 | 1.000800627 > ArrayFill.testFloatFill | 16 | 201801.067 | 342214.071 | 1.695799116 > ArrayFill.testFloatFill | 31 | 93851.962 | 322681.433 | 3.438195922 > ArrayFill.testFloatFill | 59 | 107454.704 | 162266.325 | 1.510090475 > ArrayFill.testFloatFill | 89 | 129597.511 | 158890.265 | 1.226028677 > ArrayFill.testFloatFill | 126 | 92358.492 | 151423.881 | 1.639523099 > ArrayFill.testFloatFill | 250 | 95412.586 | 96269.997 | 1.008986351 > ArrayFill.testFloatFill | 511 | 68356.016 | 73395.512 | 1.07372425 > ArrayFill.testFloatFill | 1021 | 46040.879 | 42767.414 | 0.928900901 > ArrayFill.testFloatFill | 2047 | 23876.684 | 24988.836 | 1.046578997 > ArrayFill.testFloatFill | 4095 | 12475.923 | 12598.467 | 1.00982244 > ArrayFill.testFloatFill | 8195 | 6286.263 | 6292.858 | 1.001049113 > ArrayFill.testFloatFill | 65536 | 230.041 | 248.095 | 1.078481662 > ArrayFill.testIntFill | 16 | 188215.196 | 339491.214 | 1.803739662 > ArrayFill.testIntFill | 31 | 146425.028 | 321621.325 | 2.19649147 > ArrayFill.testIntFill | 59 | 140650.413 | 194907.815 | 1.385760702 > ArrayFill.testIntFill | 89 | 78017.244 | 166579.365 | 2.13516085 > ArrayFill.testIntFill | 126 | 97645.936 | 142150.475 | 1.455774616 > ArrayFill.testIntFill | 250 | 68623.478 | 96538.532 | 1.406785765 > ArrayFill.testIntFill | 511 | 57465.869 | 84218.747 | 1.465543782 > ArrayFill.testIntFill | 1021 | 46308.298 | 45287.255 | 0.977951187 > ArrayFill.testIntFill | 2047 | 24222.479 | 25017.366 | 1.032816088 > ArrayFill.testIntFill | 4095 | 12470.853 | 12656.69 | 1.014901707 > ArrayFill.testIntFill | 8195 | 6302.584 | 6312.377 | 1.001553807 > ArrayFill.testIntFill | 65536 | 227.098 | 248.39 | 1.09375688 > ArrayFill.testLongFill | 16 | 229400.195 | 190876.891 | 0.832069437 > ArrayFill.testLongFill | 31 | 160433.763 | 161062.288 | 1.00391766 > ArrayFill.testLongFill | 59 | 117527.007 | 104990.932 | 0.893334517 > ArrayFill.testLongFill | 89 | 106400.533 | 112155.423 | 1.054087041 > ArrayFill.testLongFill | 126 | 133428.366 | 141422.605 | 1.059914089 > ArrayFill.testLongFill | 250 | 83393.535 | 70419.357 | 0.844422256 > ArrayFill.testLongFill | 511 | 48534.407 | 44830.441 | 0.923683708 > ArrayFill.testLongFill | 1021 | 25150.503 | 25144.854 | 0.999775392 > ArrayFill.testLongFill | 2047 | 12661.581 | 12495.112 | 0.986852432 > ArrayFill.testLongFill | 4095 | 6378.589 | 6326.361 | 0.991811982 > ArrayFill.testLongFill | 8195 | 884.108 | 883.225 | 0.999001253 > ArrayFill.testLongFill | 65536 | 116.544 | 115.809 | 0.993693369 > ArrayFill.testShortFill | 16 | 181717.691 | 381160.843 | 2.097543948 > ArrayFill.testShortFill | 31 | 99246.669 | 376006.724 | 3.788607999 > ArrayFill.testShortFill | 59 | 125435.022 | 308756.585 | 2.461486275 > ArrayFill.testShortFill | 89 | 116796.477 | 195568.654 | 1.674439667 > ArrayFill.testShortFill | 126 | 37346.482 | 164389.009 | 4.401726754 > ArrayFill.testShortFill | 250 | 32537.347 | 140808.889 | 4.327608179 > ArrayFill.testShortFill | 511 | 43932.519 | 103200.042 | 2.349058154 > ArrayFill.testShortFill | 1021 | 42808.585 | 80777.289 | 1.886941346 > ArrayFill.testShortFill | 2047 | 34852.049 | 41482.517 | 1.190246146 > ArrayFill.testShortFill | 4095 | 21427.935 | 24971.245 | 1.165359378 > ArrayFill.testShortFill | 8195 | 11666.17 | 12655.972 | 1.084843783 > ArrayFill.testShortFill | 65536 | 451.299 | 486.96 | 1.079018566 > > > Kindly review and share your feedbak. > > Best Regards, > Jatin src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5311: > 5309: if (UseAVX >= 2 && UseUnalignedLoadStores) { > 5310: Label L_check_fill_32_bytes; > 5311: if (UseAVX > 2) { Removing this old variant seems fine for the case when `MaxVectorSize >= 32 && VM_Version::supports_avx512vlbw()` (since it'll be handled above), but what happens when that criteria is not met? Looks like such a config would revert to the `AVX < 2` variant below, which seems sub-optimal? ------------- PR: https://git.openjdk.java.net/jdk/pull/5967 From duke at openjdk.java.net Fri Oct 15 14:08:08 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Fri, 15 Oct 2021 14:08:08 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh Message-ID: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. ------------- Commit messages: - fix typo in function name - 8275167: x86 intrinsic for unsignedMultiplyHigh Changes: https://git.openjdk.java.net/jdk/pull/5933/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5933&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275167 Stats: 59 lines in 11 files changed: 59 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5933.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5933/head:pull/5933 PR: https://git.openjdk.java.net/jdk/pull/5933 From monica.beckwith at gmail.com Fri Oct 15 14:53:36 2021 From: monica.beckwith at gmail.com (monica beckwith) Date: Fri, 15 Oct 2021 09:53:36 -0500 Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: Hi all, I am on vacation right now, but we can take a look next week and see how we can support the LLVM backend for hsdis. It is much required by Windows-AArch64 and we would hate for all the good work to not make it upstream. Monica Sent from my Arm powered smart device. On Thu, Oct 14, 2021, 9:07 AM Andrew Haley wrote: > On Wed, 13 Oct 2021 07:26:21 GMT, Ludovic Henry > wrote: > > >> This patch expands the newly added system for hsdis backends to include > LLVM. > >> > >> The actual code in hsdis-llvm.cpp is based heavily on the work by > @luhenry, as published in the never integrated PR > https://github.com/openjdk/jdk/pull/392. (I have basically just ripped > out the binutils-based part of it.) > >> > >> Unfortunately I have not been able to make this work properly on > Windows. With some additional flags I made it compile without complaints, > but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` > when I tried to load the library. This is somewhat ironic, since the > initial implementation was created by Ludovic for the very purpose of using > it on Windows. > >> > >> The lack of Windows support in this patch does not mean it is > impossible to get it to work, just that I need to co-operate with someone > who has more experience of compiling LLVM on Windows, and/or are more eager > to get this combination to work. > > > > Very happy to see it landing :) Thank you! > > > > I don't have access to a windows machine, and even less a > Windows-AArch64 machine. @lewurm would you be able to take a look? > > > As for LLVM not giving you a good user experience; I can't really tell > what's wrong. Apparently @luhenry (and @JornVernee I believe) has used > this. I'm not really the target audience myself; I'm only trying to make it > possible to use. If it is so severly limited as you say maybe this isn't > worth pursuing. Some feedback from those who have tested it would be > appeciated here, to help med understand if this patch should be dropped. > > I don't think it should be dropped, but I imagine that the bugs can be > fixed. If LLVM's disassembler always dies as soon as it sees something it > can't recognize, I'm astonished. Maybe the LLVM I'm using is bad. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5920 > From aph at openjdk.java.net Fri Oct 15 15:38:48 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 15 Oct 2021 15:38:48 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Fri, 15 Oct 2021 08:20:05 GMT, Ludovic Henry wrote: > Since the maintenance is fairly low (small codebase, small and knowledgable user base), I would be biased towards including it with appropriate warnings. I don't think we should commit code that we know is broken. I don't believe that this view might be controversial. Maybe someone should try to reproduce the failure I've seen, and then we should look at fixing it. Maybe it's a local problem. Also, this patch breaks all current hsdis builds that follow the installation instructions. Either we get revised instructions or the build should be fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From bpb at openjdk.java.net Fri Oct 15 15:56:51 2021 From: bpb at openjdk.java.net (Brian Burkhalter) Date: Fri, 15 Oct 2021 15:56:51 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh In-Reply-To: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Wed, 13 Oct 2021 18:55:10 GMT, Vamsi Parasa wrote: > Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. The `java.lang.Math` change looks good, of course. On the rest I cannot comment but it is good to see something being done here. ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From aph at openjdk.java.net Fri Oct 15 16:17:51 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 15 Oct 2021 16:17:51 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh In-Reply-To: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Wed, 13 Oct 2021 18:55:10 GMT, Vamsi Parasa wrote: > Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. src/hotspot/share/opto/mulnode.cpp line 468: > 466: } > 467: > 468: //============================================================================= MulHiLNode::Value() and UMulHiLNode::Value() seem to be identical. Perhaps some refactoring would be in order, maybe make a common shared routine. ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From aph at openjdk.java.net Fri Oct 15 16:58:52 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 15 Oct 2021 16:58:52 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh In-Reply-To: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Wed, 13 Oct 2021 18:55:10 GMT, Vamsi Parasa wrote: > Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. test/micro/org/openjdk/bench/java/lang/MathBench.java line 547: > 545: return Math.unsignedMultiplyHigh(long747, long13); > 546: } > 547: As far as I can tell, the JMH overhead dominates when trying to measure the latency of events in the nanosecond range. `unsignedMultiplyHigh` should have a latency of maybe 1.5-2ns. Is that what you saw? ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From kvn at openjdk.java.net Fri Oct 15 17:51:50 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 15 Oct 2021 17:51:50 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun May be @veresov should look too. I see that `CompilationPolicy::is_mature()` https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compilationPolicy.cpp#L889 uses `Tier4InvocationThreshold = 5000` and `ProfileMaturityPercentage=20` values to determine maturity (invocations > 1000): https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compilationPolicy.cpp#L268 I am not familiar with how many iterations the test calls compiled method. Does it uses `-Xbatch`? ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From kvn at openjdk.java.net Fri Oct 15 18:48:48 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 15 Oct 2021 18:48:48 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh In-Reply-To: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Wed, 13 Oct 2021 18:55:10 GMT, Vamsi Parasa wrote: > Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. How you verified correctness of results? I suggest to extend `test/jdk//java/lang/Math/MultiplicationTests.java` test to cover unsigned method. ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5933 From kvn at openjdk.java.net Fri Oct 15 18:57:48 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 15 Oct 2021 18:57:48 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() In-Reply-To: References: Message-ID: On Sat, 4 Sep 2021 02:58:56 GMT, SUN Guoyun wrote: > Hi all, > > When I implement a new instruct in adfile for match CMoveP with Cmp node,like this: > > match(Set dst (CMoveP (Binary cop (CmpP op1 zero)) (Binary dst zero))); > > this means right child of CmpP is immediate zero and right child of CmovP also is immediate zero, then an exception will occur: > >

> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x000000fff410fcc4, pid=11130, tid=11146
> #
> # JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137)
> # Java VM: OpenJDK 64-Bit Server VM (17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-loongarch64)
> # Problematic frame:
> # V [libjvm.so+0x21fcc4] cmovP_cmpP_zero_zeroNode::bottom_type() const+0x44
> #
> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please visit:
> # https://bugreport.java.com/bugreport/crash.jsp
> #
> 
> > In this case, cmovP_ cmpP_ zero_ zeroNode only has three input nodes, so an exception is triggered. This is a patch to fix this problem. Please help review it > > Thanks, > Sun Guoyun Someone in Oracle have to run this patch through testing to make sure it passed. Changes affect shared code. ------------- PR: https://git.openjdk.java.net/jdk/pull/5369 From duke at openjdk.java.net Fri Oct 15 19:21:53 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Fri, 15 Oct 2021 19:21:53 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh In-Reply-To: References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Fri, 15 Oct 2021 18:45:41 GMT, Vladimir Kozlov wrote: > How you verified correctness of results? I suggest to extend `test/jdk//java/lang/Math/MultiplicationTests.java` test to cover unsigned method. Tests for unsignedMultiplyHigh were already added in test/jdk//java/lang/Math/MultiplicationTests.java (in July 2021 by Brian Burkhalter). Used that test to verify the correctness of the results. ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From duke at openjdk.java.net Fri Oct 15 19:34:00 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Fri, 15 Oct 2021 19:34:00 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh In-Reply-To: References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Fri, 15 Oct 2021 16:14:25 GMT, Andrew Haley wrote: >> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. > > src/hotspot/share/opto/mulnode.cpp line 468: > >> 466: } >> 467: >> 468: //============================================================================= > > MulHiLNode::Value() and UMulHiLNode::Value() seem to be identical. Perhaps some refactoring would be in order, maybe make a common shared routine. Sure, will do the refactoring to use a shared routine. > test/micro/org/openjdk/bench/java/lang/MathBench.java line 547: > >> 545: return Math.unsignedMultiplyHigh(long747, long13); >> 546: } >> 547: > > As far as I can tell, the JMH overhead dominates when trying to measure the latency of events in the nanosecond range. `unsignedMultiplyHigh` should have a latency of maybe 1.5-2ns. Is that what you saw? Yes, the JMH overhead was dominating the measurement of latency. The latency observed for `unsignedMultiplyHigh` was 2.3ns with the intrinsic enabled. ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From iveresov at openjdk.java.net Fri Oct 15 20:09:48 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 15 Oct 2021 20:09:48 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: <6BxFgAwp3XKvti_urNYIyixN7idmCAGFx8iK9gLryxI=.afaea587-0c33-445e-b76c-1f304733431d@github.com> On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun Sounds like there is no MDO? It should create the MDO (and start profiling) in this configuration after running in the interpreter for `Tier0ProfilingStartPercentage=33` of the scale = Tier0ProfilingStartPercentage / 100.0; (i >= Tier3InvocationThreshold * scale) || (i >= Tier3MinInvocationThreshold * scale && i + b >= Tier3CompileThreshold * scale); ``` ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From kvn at openjdk.java.net Fri Oct 15 20:22:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 15 Oct 2021 20:22:45 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh In-Reply-To: References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: <8MuiklM5Nt3VkzyVHbWqwMh_LkVvVY2Mf65_0zTx4Kw=.9351008f-b489-4103-be9d-87e6fc4a8f39@github.com> On Fri, 15 Oct 2021 19:19:13 GMT, Vamsi Parasa wrote: > > How you verified correctness of results? I suggest to extend `test/jdk//java/lang/Math/MultiplicationTests.java` test to cover unsigned method. > > Tests for unsignedMultiplyHigh were already added in test/jdk//java/lang/Math/MultiplicationTests.java (in July 2021 by Brian Burkhalter). Used that test to verify the correctness of the results. Good. It seems I have old version of the test. Did you run it with -Xcomp? How you verified that intrinsic is used? ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From duke at openjdk.java.net Fri Oct 15 21:06:52 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Fri, 15 Oct 2021 21:06:52 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh In-Reply-To: <8MuiklM5Nt3VkzyVHbWqwMh_LkVvVY2Mf65_0zTx4Kw=.9351008f-b489-4103-be9d-87e6fc4a8f39@github.com> References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> <8MuiklM5Nt3VkzyVHbWqwMh_LkVvVY2Mf65_0zTx4Kw=.9351008f-b489-4103-be9d-87e6fc4a8f39@github.com> Message-ID: On Fri, 15 Oct 2021 20:19:31 GMT, Vladimir Kozlov wrote: > > > How you verified correctness of results? I suggest to extend `test/jdk//java/lang/Math/MultiplicationTests.java` test to cover unsigned method. > > > > > > Tests for unsignedMultiplyHigh were already added in test/jdk//java/lang/Math/MultiplicationTests.java (in July 2021 by Brian Burkhalter). Used that test to verify the correctness of the results. > > Good. It seems I have old version of the test. Did you run it with -Xcomp? How you verified that intrinsic is used? The tests were not run with -Xcomp. Looked at the generated x86 code using the disassembler (hsdis) and verified that that correct instruction (mul) instead of (imul, without the intrinsic) was being generated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From jbhateja at openjdk.java.net Sat Oct 16 03:13:14 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sat, 16 Oct 2021 03:13:14 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v2] In-Reply-To: References: Message-ID: > Hi All, > > This patch optimizes macro assembly routines used by fill stubs of various primitive types for X86 AVX-512 target. > Following are the main changes:- > 1) Specialized instruction sequence for fill operation over various block sizes. > 2) Control flow is sensitive to AVX3Threshold and generated code operates over 32 byte vector (YMM) if > block size is less than threshold else instructions operate of 64 byte vector (ZMM). > 3) Bulk fill operation is performed by a destination aligned fill loop with appropriate unroll factor, this > avoids any cache line split penalty and improves performance. > 4) Currently fill patterns are vectorized by auto-vectorizer and generated code operates over vectors > of MaxVectorSize, in addition auto-vectorizer is oblivious to AVX3Thresholds and this may result into > performance degradation over prior generation of X86 servers where 64 byte vector stores using ZMM > registers operates at reduced CPU frequency. > Patch enables JVM runtime flag -XX:+OptimizedFill ON by default for X86 target supporting AVX-512 feature. > 5) Patch also optimizes the mask generation sequence of fill* macro assembly routines using BZHI instruction. > > Performance measurements of an existing JMH micro over Icelake server shows ~1.1-4.0X gains for fill operation with varying block sizes. > > Following are detailed results: > > System Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S) > > Benchmark | Size | Baseline Auto-vectorized -XX:-OptimizeFill (ops/ms) | New Optimized Fill AVX3 Th=4096 (ops/ms) | Gain Factor (OptFill AVX3Th=4096/Baseline) > -- | -- | -- | -- | -- > ArrayFill.testByteFill | 16 | 193994.942 | 381142.844 | 1.964705059 > ArrayFill.testByteFill | 31 | 99817.403 | 399973.74 | 4.007054161 > ArrayFill.testByteFill | 59 | 80759.378 | 342165.394 | 4.236850289 > ArrayFill.testByteFill | 89 | 127342.997 | 341696.357 | 2.683275603 > ArrayFill.testByteFill | 126 | 72081.809 | 309335.351 | 4.291448221 > ArrayFill.testByteFill | 250 | 41419.435 | 166618.264 | 4.022707311 > ArrayFill.testByteFill | 511 | 32509.962 | 138595.951 | 4.263184036 > ArrayFill.testByteFill | 1021 | 35930.96 | 90622.597 | 2.522131248 > ArrayFill.testByteFill | 2047 | 32956.62 | 67252.442 | 2.040635296 > ArrayFill.testByteFill | 4095 | 29180.81 | 45508.86 | 1.559547525 > ArrayFill.testByteFill | 8195 | 17468.775 | 25072.671 | 1.435285016 > ArrayFill.testByteFill | 65536 | 978.482 | 946.377 | 0.967188972 > ArrayFill.testCharFill | 16 | 205893.99 | 381151.485 | 1.851202578 > ArrayFill.testCharFill | 31 | 90418.278 | 385694.751 | 4.265672379 > ArrayFill.testCharFill | 59 | 117391.45 | 310132.477 | 2.641865971 > ArrayFill.testCharFill | 89 | 117956.135 | 202314.017 | 1.715163158 > ArrayFill.testCharFill | 126 | 70174.917 | 164571.761 | 2.345165025 > ArrayFill.testCharFill | 250 | 37243.255 | 141460.648 | 3.798289059 > ArrayFill.testCharFill | 511 | 33788.369 | 98578.472 | 2.917526797 > ArrayFill.testCharFill | 1021 | 33655.897 | 78305.288 | 2.326643916 > ArrayFill.testCharFill | 2047 | 35656.759 | 41973.205 | 1.177145825 > ArrayFill.testCharFill | 4095 | 16311.779 | 24724.413 | 1.515739822 > ArrayFill.testCharFill | 8195 | 11412.845 | 12599.1 | 1.103940341 > ArrayFill.testCharFill | 65536 | 476.138 | 486.723 | 1.02223095 > ArrayFill.testDoubleFill | 16 | 222222.265 | 193741.026 | 0.871834449 > ArrayFill.testDoubleFill | 31 | 169693.273 | 155377.031 | 0.915634593 > ArrayFill.testDoubleFill | 59 | 101838.606 | 197496.671 | 1.939310432 > ArrayFill.testDoubleFill | 89 | 106202.786 | 182813.717 | 1.721364607 > ArrayFill.testDoubleFill | 126 | 128696.666 | 123066.432 | 0.956251905 > ArrayFill.testDoubleFill | 250 | 81145.924 | 90895.167 | 1.120144581 > ArrayFill.testDoubleFill | 511 | 44615.14 | 48668.332 | 1.090847905 > ArrayFill.testDoubleFill | 1021 | 25191.332 | 25152.377 | 0.998453635 > ArrayFill.testDoubleFill | 2047 | 11337.929 | 12655.112 | 1.11617492 > ArrayFill.testDoubleFill | 4095 | 6378.326 | 6378.392 | 1.000010348 > ArrayFill.testDoubleFill | 8195 | 885.269 | 882.644 | 0.9970348 > ArrayFill.testDoubleFill | 65536 | 121.155 | 121.252 | 1.000800627 > ArrayFill.testFloatFill | 16 | 201801.067 | 342214.071 | 1.695799116 > ArrayFill.testFloatFill | 31 | 93851.962 | 322681.433 | 3.438195922 > ArrayFill.testFloatFill | 59 | 107454.704 | 162266.325 | 1.510090475 > ArrayFill.testFloatFill | 89 | 129597.511 | 158890.265 | 1.226028677 > ArrayFill.testFloatFill | 126 | 92358.492 | 151423.881 | 1.639523099 > ArrayFill.testFloatFill | 250 | 95412.586 | 96269.997 | 1.008986351 > ArrayFill.testFloatFill | 511 | 68356.016 | 73395.512 | 1.07372425 > ArrayFill.testFloatFill | 1021 | 46040.879 | 42767.414 | 0.928900901 > ArrayFill.testFloatFill | 2047 | 23876.684 | 24988.836 | 1.046578997 > ArrayFill.testFloatFill | 4095 | 12475.923 | 12598.467 | 1.00982244 > ArrayFill.testFloatFill | 8195 | 6286.263 | 6292.858 | 1.001049113 > ArrayFill.testFloatFill | 65536 | 230.041 | 248.095 | 1.078481662 > ArrayFill.testIntFill | 16 | 188215.196 | 339491.214 | 1.803739662 > ArrayFill.testIntFill | 31 | 146425.028 | 321621.325 | 2.19649147 > ArrayFill.testIntFill | 59 | 140650.413 | 194907.815 | 1.385760702 > ArrayFill.testIntFill | 89 | 78017.244 | 166579.365 | 2.13516085 > ArrayFill.testIntFill | 126 | 97645.936 | 142150.475 | 1.455774616 > ArrayFill.testIntFill | 250 | 68623.478 | 96538.532 | 1.406785765 > ArrayFill.testIntFill | 511 | 57465.869 | 84218.747 | 1.465543782 > ArrayFill.testIntFill | 1021 | 46308.298 | 45287.255 | 0.977951187 > ArrayFill.testIntFill | 2047 | 24222.479 | 25017.366 | 1.032816088 > ArrayFill.testIntFill | 4095 | 12470.853 | 12656.69 | 1.014901707 > ArrayFill.testIntFill | 8195 | 6302.584 | 6312.377 | 1.001553807 > ArrayFill.testIntFill | 65536 | 227.098 | 248.39 | 1.09375688 > ArrayFill.testLongFill | 16 | 229400.195 | 190876.891 | 0.832069437 > ArrayFill.testLongFill | 31 | 160433.763 | 161062.288 | 1.00391766 > ArrayFill.testLongFill | 59 | 117527.007 | 104990.932 | 0.893334517 > ArrayFill.testLongFill | 89 | 106400.533 | 112155.423 | 1.054087041 > ArrayFill.testLongFill | 126 | 133428.366 | 141422.605 | 1.059914089 > ArrayFill.testLongFill | 250 | 83393.535 | 70419.357 | 0.844422256 > ArrayFill.testLongFill | 511 | 48534.407 | 44830.441 | 0.923683708 > ArrayFill.testLongFill | 1021 | 25150.503 | 25144.854 | 0.999775392 > ArrayFill.testLongFill | 2047 | 12661.581 | 12495.112 | 0.986852432 > ArrayFill.testLongFill | 4095 | 6378.589 | 6326.361 | 0.991811982 > ArrayFill.testLongFill | 8195 | 884.108 | 883.225 | 0.999001253 > ArrayFill.testLongFill | 65536 | 116.544 | 115.809 | 0.993693369 > ArrayFill.testShortFill | 16 | 181717.691 | 381160.843 | 2.097543948 > ArrayFill.testShortFill | 31 | 99246.669 | 376006.724 | 3.788607999 > ArrayFill.testShortFill | 59 | 125435.022 | 308756.585 | 2.461486275 > ArrayFill.testShortFill | 89 | 116796.477 | 195568.654 | 1.674439667 > ArrayFill.testShortFill | 126 | 37346.482 | 164389.009 | 4.401726754 > ArrayFill.testShortFill | 250 | 32537.347 | 140808.889 | 4.327608179 > ArrayFill.testShortFill | 511 | 43932.519 | 103200.042 | 2.349058154 > ArrayFill.testShortFill | 1021 | 42808.585 | 80777.289 | 1.886941346 > ArrayFill.testShortFill | 2047 | 34852.049 | 41482.517 | 1.190246146 > ArrayFill.testShortFill | 4095 | 21427.935 | 24971.245 | 1.165359378 > ArrayFill.testShortFill | 8195 | 11666.17 | 12655.972 | 1.084843783 > ArrayFill.testShortFill | 65536 | 451.299 | 486.96 | 1.079018566 > > > Kindly review and share your feedbak. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8275047: Review comments resolved. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5967/files - new: https://git.openjdk.java.net/jdk/pull/5967/files/14282a08..ec0ff759 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5967&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5967&range=00-01 Stats: 28 lines in 1 file changed: 26 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5967/head:pull/5967 PR: https://git.openjdk.java.net/jdk/pull/5967 From ihse at openjdk.java.net Sat Oct 16 08:15:47 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Sat, 16 Oct 2021 08:15:47 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: <0i45tJK6ikOSt-m5vD-t6utK3wvLRCkTBCcY0pIOESI=.3ea75a59-e7ea-4d5e-a9b4-10a17e299c36@github.com> On Fri, 15 Oct 2021 15:36:14 GMT, Andrew Haley wrote: >> The value add of this LLVM-based hsdis is two-fold: >> - It supports platforms that aren't supported by binutils (Windows-AArch64 for example) >> - The license being more permissive would allow to build it as part of the OpenJDK build more easily (and even maybe ship it?) >> >> LLVM has a strong track record of supporting new platforms (Windows-AArch64 and macOS-AArch64 for example, mostly because of investment from Microsoft and Apple respectively), and `hsdis` is a necessary tool for porting the OpenJDK to any new platform. Since the maintenance is fairly low (small codebase, small and knowledgable user base), I would be biased towards including it with appropriate warnings. > >> Since the maintenance is fairly low (small codebase, small and knowledgable user base), I would be biased towards including it with appropriate warnings. > > I don't think we should commit code that we know is broken. I don't believe that this view might be controversial. > Maybe someone should try to reproduce the failure I've seen, and then we should look at fixing it. Maybe it's a local problem. > > Also, this patch breaks all current hsdis builds that follow the installation instructions. Either we get revised instructions or the build should be fixed. @theRealAph We should not push broken code, and we should not break the existing build of hsdis. I fully agree with this. I will not push this patch until all reviewers are happy, so you don't need to worry about that. :) My initial plan was to get the unix platforms working in this push, and tackle Windows later on, but it seems now that it's better to keep this PR around for a bit longer instead, and fold Windows support into it as well. (Which means I'll wait for Monica to return and being able to test and help out.) I need to understand better why things are failing for you. Can you describe a reproducer? ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From duke at openjdk.java.net Sat Oct 16 09:24:48 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Sat, 16 Oct 2021 09:24:48 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun MDO already exists, but it is not loaded.The following modification is to use `ensure_ method_ data()` function ensures that `ciMethodData:: load_data()` can be called,then `method_data()->is_mature()` will be TRUE. diff --git a/src/hotspot/share/ci/ciMethod.cpp b/src/hotspot/share/ci/ciMethod.cpp index 862824c5b72..ff2c8ecb8e8 100644 --- a/src/hotspot/share/ci/ciMethod.cpp +++ b/src/hotspot/share/ci/ciMethod.cpp @@ -458,7 +458,8 @@ int ciMethod::check_overflow(int c, Bytecodes::Code code) { ciCallProfile ciMethod::call_profile_at_bci(int bci) { ResourceMark rm; ciCallProfile result; - if (method_data() != NULL && method_data()->is_mature()) { + if (ensure_method_data()){ ciProfileData* data = method_data()->bci_to_data(bci); if (data != NULL && data->is_CounterData()) { // Every profiled call site has a counter. ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From iveresov at openjdk.java.net Sat Oct 16 17:37:47 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Sat, 16 Oct 2021 17:37:47 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun ```ensure_method_data()``` creates an MDO if it doesn't exist. I think that's what's happening. Because ```ciMethod::method_data()``` would normally do the load and it clearly doesn't happen. ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From thartmann at openjdk.java.net Mon Oct 18 06:46:50 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 18 Oct 2021 06:46:50 GMT Subject: RFR: JDK-8251513: Code around Parse::do_lookupswitch/do_tableswitch should be cleaned up In-Reply-To: <4xMeWOd5dV5wtEPDp1UuLCLJDKcj9seD9_IpiXS8PrA=.f172e00f-7060-40d7-a52a-f94428c27083@github.com> References: <4xMeWOd5dV5wtEPDp1UuLCLJDKcj9seD9_IpiXS8PrA=.f172e00f-7060-40d7-a52a-f94428c27083@github.com> Message-ID: On Wed, 6 Oct 2021 09:27:15 GMT, Tobias Holenstein wrote: > - `default_cnt` can be computed without using a loop: > > An example of how `defaults` was computed before at parse2.cpp:521-533 with switch labels `-10`, `0`, `10`, `42` and `200`: > > defaults = 0 > defaults += -10 - (-2147483648) > defaults += 0 - (-10 + 1) > defaults += 10 - (0 + 1) > defaults += 42 - (10 + 1) > defaults += 200 - (42 + 1) > defaults += 2147483647 - (200 + 1) + 1 > > => `defaults` = > -10 - (-2147483648) + 0 - (-10 + 1) + 10 - (0 + 1) + 42 - (10 + 1) + 200 - (42 + 1) + 2147483647 - (200 + 1) + 1 = > 4294967291 = 2147483648 + 2147483648 - 5 > BUT actually `defaults` was : `defaults` = 2147483648 + 2147483648 > > The reason has to do with using floats: > ((float)match_int - (float)prev) == (-(float)prev) is True > for match_int=-10, prev=-2147483648 > > BUT actually `defaults` (2147483648 + 2147483648 - 5) can also be computed without using a loop with `juint defaults = max_juint - len` > > > - also made some casts explicit > > - A lot of casts could be avoided by making `_cnt` in `SwitchRange` a uint. Unfortunately, the Range for the default values of a switch in `do_lookupswitch` calculates the count by scaling the average cnt/label up to cnt/range which needs a float to store an accurate result Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5837 From duke at openjdk.java.net Mon Oct 18 06:56:53 2021 From: duke at openjdk.java.net (Tobias Holenstein) Date: Mon, 18 Oct 2021 06:56:53 GMT Subject: RFR: JDK-8251513: Code around Parse::do_lookupswitch/do_tableswitch should be cleaned up In-Reply-To: References: <4xMeWOd5dV5wtEPDp1UuLCLJDKcj9seD9_IpiXS8PrA=.f172e00f-7060-40d7-a52a-f94428c27083@github.com> Message-ID: On Wed, 13 Oct 2021 13:40:04 GMT, Roland Westrelin wrote: >> - `default_cnt` can be computed without using a loop: >> >> An example of how `defaults` was computed before at parse2.cpp:521-533 with switch labels `-10`, `0`, `10`, `42` and `200`: >> >> defaults = 0 >> defaults += -10 - (-2147483648) >> defaults += 0 - (-10 + 1) >> defaults += 10 - (0 + 1) >> defaults += 42 - (10 + 1) >> defaults += 200 - (42 + 1) >> defaults += 2147483647 - (200 + 1) + 1 >> >> => `defaults` = >> -10 - (-2147483648) + 0 - (-10 + 1) + 10 - (0 + 1) + 42 - (10 + 1) + 200 - (42 + 1) + 2147483647 - (200 + 1) + 1 = >> 4294967291 = 2147483648 + 2147483648 - 5 >> BUT actually `defaults` was : `defaults` = 2147483648 + 2147483648 >> >> The reason has to do with using floats: >> ((float)match_int - (float)prev) == (-(float)prev) is True >> for match_int=-10, prev=-2147483648 >> >> BUT actually `defaults` (2147483648 + 2147483648 - 5) can also be computed without using a loop with `juint defaults = max_juint - len` >> >> >> - also made some casts explicit >> >> - A lot of casts could be avoided by making `_cnt` in `SwitchRange` a uint. Unfortunately, the Range for the default values of a switch in `do_lookupswitch` calculates the count by scaling the average cnt/label up to cnt/range which needs a float to store an accurate result > > That looks good to me. @rwestrel and @TobiHartmann thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5837 From duke at openjdk.java.net Mon Oct 18 07:36:58 2021 From: duke at openjdk.java.net (Tobias Holenstein) Date: Mon, 18 Oct 2021 07:36:58 GMT Subject: Integrated: JDK-8251513: Code around Parse::do_lookupswitch/do_tableswitch should be cleaned up In-Reply-To: <4xMeWOd5dV5wtEPDp1UuLCLJDKcj9seD9_IpiXS8PrA=.f172e00f-7060-40d7-a52a-f94428c27083@github.com> References: <4xMeWOd5dV5wtEPDp1UuLCLJDKcj9seD9_IpiXS8PrA=.f172e00f-7060-40d7-a52a-f94428c27083@github.com> Message-ID: On Wed, 6 Oct 2021 09:27:15 GMT, Tobias Holenstein wrote: > - `default_cnt` can be computed without using a loop: > > An example of how `defaults` was computed before at parse2.cpp:521-533 with switch labels `-10`, `0`, `10`, `42` and `200`: > > defaults = 0 > defaults += -10 - (-2147483648) > defaults += 0 - (-10 + 1) > defaults += 10 - (0 + 1) > defaults += 42 - (10 + 1) > defaults += 200 - (42 + 1) > defaults += 2147483647 - (200 + 1) + 1 > > => `defaults` = > -10 - (-2147483648) + 0 - (-10 + 1) + 10 - (0 + 1) + 42 - (10 + 1) + 200 - (42 + 1) + 2147483647 - (200 + 1) + 1 = > 4294967291 = 2147483648 + 2147483648 - 5 > BUT actually `defaults` was : `defaults` = 2147483648 + 2147483648 > > The reason has to do with using floats: > ((float)match_int - (float)prev) == (-(float)prev) is True > for match_int=-10, prev=-2147483648 > > BUT actually `defaults` (2147483648 + 2147483648 - 5) can also be computed without using a loop with `juint defaults = max_juint - len` > > > - also made some casts explicit > > - A lot of casts could be avoided by making `_cnt` in `SwitchRange` a uint. Unfortunately, the Range for the default values of a switch in `do_lookupswitch` calculates the count by scaling the average cnt/label up to cnt/range which needs a float to store an accurate result This pull request has now been integrated. Changeset: ebb1363e Author: Tobias Holenstein Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/ebb1363e5d6b47daf1badad93490580fedcb0572 Stats: 38 lines in 2 files changed: 3 ins; 12 del; 23 mod 8251513: Code around Parse::do_lookupswitch/do_tableswitch should be cleaned up Reviewed-by: roland, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5837 From aph at openjdk.java.net Mon Oct 18 09:13:48 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 18 Oct 2021 09:13:48 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: <8GQB2s_R1_B2jvcdUQQEw5aJyxZMlG9bb_to6Iw35zw=.6e803183-5e7c-407c-a269-1b92392b800d@github.com> On Wed, 13 Oct 2021 00:00:22 GMT, Magnus Ihse Bursie wrote: > This patch expands the newly added system for hsdis backends to include LLVM. > > The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) > > Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. > > The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. On 10/16/21 09:13, Magnus Ihse Bursie wrote: > @theRealAph We should not push broken code, and we should not break the existing build of hsdis. I fully agree with this. I will not push this patch until all reviewers are happy, so you don't need to worry about that. :) > > My initial plan was to get the unix platforms working in this push, and tackle Windows later on, but it seems now that it's better to keep this PR around for a bit longer instead, and fold Windows support into it as well. (Which means I'll wait for Monica to return and being able to test and help out.) > > I need to understand better why things are failing for you. Can you describe a reproducer? Sure. Create a .hotspot_compiler file containing print java.lang.String::checkIndex then ./build/macosx-aarch64-server-fastdebug/jdk/bin/java -XX:+PrintAssembly -version GNU disassembly of java.lang.String::isLatin1 prints in full, ending thusly [Exception Handler] 0x000000010a72c340: mov x19, #0xdead // #57005 ; {no_reloc} 0x000000010a72c344: mov x2, #0xa // #10 0x000000010a72c348: mov x4, #0xdead // #57005 0x000000010a72c34c: mov x5, #0xdead // #57005 0x000000010a72c350: adrp x8, 0x000000010a1c3000 ; {runtime_call handle_exception_from_callee Runtime1 stub} 0x000000010a72c354: add x8, x8, #0x1c0 0x000000010a72c358: blr x8 0x000000010a72c35c: dcps1 #0xdeae 0x000000010a72c360: .inst 0x0995f68e ; undefined 0x000000010a72c364: udf #1 [Deopt Handler Code] 0x000000010a72c368: adr x30, 0x000000010a72c368 0x000000010a72c36c: adrp x8, 0x000000010a26c000 ; {runtime_call DeoptimizationBlob} 0x000000010a72c370: add x8, x8, #0xbc0 0x000000010a72c374: br x8 -------------------------------------------------------------------------------- LLVM disassembly dies at [Exception Handler] 0x000000010672c340: mov x19, #0xdead ; {no_reloc} 0x000000010672c344: mov x2, #0xa 0x000000010672c348: mov x4, #0xdead 0x000000010672c34c: mov x5, #0xdead 0x000000010672c350: adrp x8, #-5672960 ; {runtime_call handle_exception_from_callee Runtime1 stub} 0x000000010672c354: add x8, x8, #0x1c0 0x000000010672c358: blr x8 0x000000010672c35c: dcps1 #0xdeae 0x000000010672c360: -------------------------------------------------------------------------------- This is llvm 12.0.1 from homebrew. ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From aph at openjdk.java.net Mon Oct 18 10:12:51 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 18 Oct 2021 10:12:51 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: <8GQB2s_R1_B2jvcdUQQEw5aJyxZMlG9bb_to6Iw35zw=.6e803183-5e7c-407c-a269-1b92392b800d@github.com> References: <8GQB2s_R1_B2jvcdUQQEw5aJyxZMlG9bb_to6Iw35zw=.6e803183-5e7c-407c-a269-1b92392b800d@github.com> Message-ID: On Mon, 18 Oct 2021 09:10:32 GMT, Andrew Haley wrote: > >Can you describe a reproducer? > Sure. Create a .hotspot_compiler file containing print java.lang.String::checkIndex then Sorry, thinko. You don't need the .hotspot_compiler file ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From thartmann at openjdk.java.net Mon Oct 18 11:09:49 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 18 Oct 2021 11:09:49 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() In-Reply-To: References: Message-ID: On Sat, 4 Sep 2021 02:58:56 GMT, SUN Guoyun wrote: > Hi all, > > When I implement a new instruct in adfile for match CMoveP with Cmp node,like this: > > match(Set dst (CMoveP (Binary cop (CmpP op1 zero)) (Binary dst zero))); > > this means right child of CmpP is immediate zero and right child of CmovP also is immediate zero, then an exception will occur: > >

> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x000000fff410fcc4, pid=11130, tid=11146
> #
> # JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137)
> # Java VM: OpenJDK 64-Bit Server VM (17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-loongarch64)
> # Problematic frame:
> # V [libjvm.so+0x21fcc4] cmovP_cmpP_zero_zeroNode::bottom_type() const+0x44
> #
> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please visit:
> # https://bugreport.java.com/bugreport/crash.jsp
> #
> 
> > In this case, cmovP_ cmpP_ zero_ zeroNode only has three input nodes, so an exception is triggered. This is a patch to fix this problem. Please help review it > > Thanks, > Sun Guoyun I executed testing, all green. ------------- PR: https://git.openjdk.java.net/jdk/pull/5369 From neliasso at openjdk.java.net Mon Oct 18 14:40:10 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 18 Oct 2021 14:40:10 GMT Subject: RFR: 8273277: C2: Move conditional negation into rc_predicate Message-ID: Hi, I need some feedback on this patch. This was reported from Tencent and found in internal testing about the same time. This patch is based on a a patch provided by Tencent. In some very specific circumstances we need to negate the range checks that we create in PhaseIdealLoop::loop_predication_impl_helper. This is done in three places, but that method also calls insert_initial_skeleton_predicate where this isn't taken into account. To simplify things I have moved the negation logic into rc_predicate. This should prevent us from missing this check again. I do have a concern that negating the condition of the rangecheck in the skeleton predicate, since the skeleton predicate will be used as a clone template, and some rangechecks optimizations seems to assume that range checks always have LT as the condidtion. On the other hand - it is a really uncommon scenario since we haven't failed here before. Feedback appreciated. Best regards, Nils ------------- Commit messages: - 8273277 fix rc_predicate Changes: https://git.openjdk.java.net/jdk/pull/5987/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5987&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273277 Stats: 27 lines in 3 files changed: 3 ins; 12 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/5987.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5987/head:pull/5987 PR: https://git.openjdk.java.net/jdk/pull/5987 From jbhateja at openjdk.java.net Mon Oct 18 14:57:13 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 18 Oct 2021 14:57:13 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v3] In-Reply-To: References: Message-ID: > Hi All, > > This patch optimizes macro assembly routines used by fill stubs of various primitive types for X86 AVX-512 target. > Following are the main changes:- > 1) Specialized instruction sequence for fill operation over various block sizes. > 2) Control flow is sensitive to AVX3Threshold and generated code operates over 32 byte vector (YMM) if > block size is less than threshold else instructions operate of 64 byte vector (ZMM). > 3) Bulk fill operation is performed by a destination aligned fill loop with appropriate unroll factor, this > avoids any cache line split penalty and improves performance. > 4) Currently fill patterns are vectorized by auto-vectorizer and generated code operates over vectors > of MaxVectorSize, in addition auto-vectorizer is oblivious to AVX3Thresholds and this may result into > performance degradation over prior generation of X86 servers where 64 byte vector stores using ZMM > registers operates at reduced CPU frequency. > Patch enables JVM runtime flag -XX:+OptimizedFill ON by default for X86 target supporting AVX-512 feature. > 5) Patch also optimizes the mask generation sequence of fill* macro assembly routines using BZHI instruction. > > Performance measurements of an existing JMH micro over Icelake server shows ~1.1-4.0X gains for fill operation with varying block sizes. > > Following are detailed results: > > System Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S) > > Benchmark | Size | Baseline Auto-vectorized -XX:-OptimizeFill (ops/ms) | New Optimized Fill AVX3 Th=4096 (ops/ms) | Gain Factor (OptFill AVX3Th=4096/Baseline) > -- | -- | -- | -- | -- > ArrayFill.testByteFill | 16 | 193994.942 | 381142.844 | 1.964705059 > ArrayFill.testByteFill | 31 | 99817.403 | 399973.74 | 4.007054161 > ArrayFill.testByteFill | 59 | 80759.378 | 342165.394 | 4.236850289 > ArrayFill.testByteFill | 89 | 127342.997 | 341696.357 | 2.683275603 > ArrayFill.testByteFill | 126 | 72081.809 | 309335.351 | 4.291448221 > ArrayFill.testByteFill | 250 | 41419.435 | 166618.264 | 4.022707311 > ArrayFill.testByteFill | 511 | 32509.962 | 138595.951 | 4.263184036 > ArrayFill.testByteFill | 1021 | 35930.96 | 90622.597 | 2.522131248 > ArrayFill.testByteFill | 2047 | 32956.62 | 67252.442 | 2.040635296 > ArrayFill.testByteFill | 4095 | 29180.81 | 45508.86 | 1.559547525 > ArrayFill.testByteFill | 8195 | 17468.775 | 25072.671 | 1.435285016 > ArrayFill.testByteFill | 65536 | 978.482 | 946.377 | 0.967188972 > ArrayFill.testCharFill | 16 | 205893.99 | 381151.485 | 1.851202578 > ArrayFill.testCharFill | 31 | 90418.278 | 385694.751 | 4.265672379 > ArrayFill.testCharFill | 59 | 117391.45 | 310132.477 | 2.641865971 > ArrayFill.testCharFill | 89 | 117956.135 | 202314.017 | 1.715163158 > ArrayFill.testCharFill | 126 | 70174.917 | 164571.761 | 2.345165025 > ArrayFill.testCharFill | 250 | 37243.255 | 141460.648 | 3.798289059 > ArrayFill.testCharFill | 511 | 33788.369 | 98578.472 | 2.917526797 > ArrayFill.testCharFill | 1021 | 33655.897 | 78305.288 | 2.326643916 > ArrayFill.testCharFill | 2047 | 35656.759 | 41973.205 | 1.177145825 > ArrayFill.testCharFill | 4095 | 16311.779 | 24724.413 | 1.515739822 > ArrayFill.testCharFill | 8195 | 11412.845 | 12599.1 | 1.103940341 > ArrayFill.testCharFill | 65536 | 476.138 | 486.723 | 1.02223095 > ArrayFill.testDoubleFill | 16 | 222222.265 | 193741.026 | 0.871834449 > ArrayFill.testDoubleFill | 31 | 169693.273 | 155377.031 | 0.915634593 > ArrayFill.testDoubleFill | 59 | 101838.606 | 197496.671 | 1.939310432 > ArrayFill.testDoubleFill | 89 | 106202.786 | 182813.717 | 1.721364607 > ArrayFill.testDoubleFill | 126 | 128696.666 | 123066.432 | 0.956251905 > ArrayFill.testDoubleFill | 250 | 81145.924 | 90895.167 | 1.120144581 > ArrayFill.testDoubleFill | 511 | 44615.14 | 48668.332 | 1.090847905 > ArrayFill.testDoubleFill | 1021 | 25191.332 | 25152.377 | 0.998453635 > ArrayFill.testDoubleFill | 2047 | 11337.929 | 12655.112 | 1.11617492 > ArrayFill.testDoubleFill | 4095 | 6378.326 | 6378.392 | 1.000010348 > ArrayFill.testDoubleFill | 8195 | 885.269 | 882.644 | 0.9970348 > ArrayFill.testDoubleFill | 65536 | 121.155 | 121.252 | 1.000800627 > ArrayFill.testFloatFill | 16 | 201801.067 | 342214.071 | 1.695799116 > ArrayFill.testFloatFill | 31 | 93851.962 | 322681.433 | 3.438195922 > ArrayFill.testFloatFill | 59 | 107454.704 | 162266.325 | 1.510090475 > ArrayFill.testFloatFill | 89 | 129597.511 | 158890.265 | 1.226028677 > ArrayFill.testFloatFill | 126 | 92358.492 | 151423.881 | 1.639523099 > ArrayFill.testFloatFill | 250 | 95412.586 | 96269.997 | 1.008986351 > ArrayFill.testFloatFill | 511 | 68356.016 | 73395.512 | 1.07372425 > ArrayFill.testFloatFill | 1021 | 46040.879 | 42767.414 | 0.928900901 > ArrayFill.testFloatFill | 2047 | 23876.684 | 24988.836 | 1.046578997 > ArrayFill.testFloatFill | 4095 | 12475.923 | 12598.467 | 1.00982244 > ArrayFill.testFloatFill | 8195 | 6286.263 | 6292.858 | 1.001049113 > ArrayFill.testFloatFill | 65536 | 230.041 | 248.095 | 1.078481662 > ArrayFill.testIntFill | 16 | 188215.196 | 339491.214 | 1.803739662 > ArrayFill.testIntFill | 31 | 146425.028 | 321621.325 | 2.19649147 > ArrayFill.testIntFill | 59 | 140650.413 | 194907.815 | 1.385760702 > ArrayFill.testIntFill | 89 | 78017.244 | 166579.365 | 2.13516085 > ArrayFill.testIntFill | 126 | 97645.936 | 142150.475 | 1.455774616 > ArrayFill.testIntFill | 250 | 68623.478 | 96538.532 | 1.406785765 > ArrayFill.testIntFill | 511 | 57465.869 | 84218.747 | 1.465543782 > ArrayFill.testIntFill | 1021 | 46308.298 | 45287.255 | 0.977951187 > ArrayFill.testIntFill | 2047 | 24222.479 | 25017.366 | 1.032816088 > ArrayFill.testIntFill | 4095 | 12470.853 | 12656.69 | 1.014901707 > ArrayFill.testIntFill | 8195 | 6302.584 | 6312.377 | 1.001553807 > ArrayFill.testIntFill | 65536 | 227.098 | 248.39 | 1.09375688 > ArrayFill.testLongFill | 16 | 229400.195 | 190876.891 | 0.832069437 > ArrayFill.testLongFill | 31 | 160433.763 | 161062.288 | 1.00391766 > ArrayFill.testLongFill | 59 | 117527.007 | 104990.932 | 0.893334517 > ArrayFill.testLongFill | 89 | 106400.533 | 112155.423 | 1.054087041 > ArrayFill.testLongFill | 126 | 133428.366 | 141422.605 | 1.059914089 > ArrayFill.testLongFill | 250 | 83393.535 | 70419.357 | 0.844422256 > ArrayFill.testLongFill | 511 | 48534.407 | 44830.441 | 0.923683708 > ArrayFill.testLongFill | 1021 | 25150.503 | 25144.854 | 0.999775392 > ArrayFill.testLongFill | 2047 | 12661.581 | 12495.112 | 0.986852432 > ArrayFill.testLongFill | 4095 | 6378.589 | 6326.361 | 0.991811982 > ArrayFill.testLongFill | 8195 | 884.108 | 883.225 | 0.999001253 > ArrayFill.testLongFill | 65536 | 116.544 | 115.809 | 0.993693369 > ArrayFill.testShortFill | 16 | 181717.691 | 381160.843 | 2.097543948 > ArrayFill.testShortFill | 31 | 99246.669 | 376006.724 | 3.788607999 > ArrayFill.testShortFill | 59 | 125435.022 | 308756.585 | 2.461486275 > ArrayFill.testShortFill | 89 | 116796.477 | 195568.654 | 1.674439667 > ArrayFill.testShortFill | 126 | 37346.482 | 164389.009 | 4.401726754 > ArrayFill.testShortFill | 250 | 32537.347 | 140808.889 | 4.327608179 > ArrayFill.testShortFill | 511 | 43932.519 | 103200.042 | 2.349058154 > ArrayFill.testShortFill | 1021 | 42808.585 | 80777.289 | 1.886941346 > ArrayFill.testShortFill | 2047 | 34852.049 | 41482.517 | 1.190246146 > ArrayFill.testShortFill | 4095 | 21427.935 | 24971.245 | 1.165359378 > ArrayFill.testShortFill | 8195 | 11666.17 | 12655.972 | 1.084843783 > ArrayFill.testShortFill | 65536 | 451.299 | 486.96 | 1.079018566 > > > Kindly review and share your feedbak. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8275047: Aligning the main fill loops and some synthetic changes. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5967/files - new: https://git.openjdk.java.net/jdk/pull/5967/files/ec0ff759..d599ac2d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5967&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5967&range=01-02 Stats: 18 lines in 4 files changed: 5 ins; 2 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/5967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5967/head:pull/5967 PR: https://git.openjdk.java.net/jdk/pull/5967 From kvn at openjdk.java.net Mon Oct 18 15:27:50 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 18 Oct 2021 15:27:50 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() In-Reply-To: References: Message-ID: <1VHepO4p3qvyDA9b_vjX8A04TvQ38Slq9Sf-O39KCHI=.63a8f738-3066-4a03-853c-2b2ede6d5e5f@github.com> On Sat, 4 Sep 2021 02:58:56 GMT, SUN Guoyun wrote: > Hi all, > > When I implement a new instruct in adfile for match CMoveP with Cmp node,like this: > > match(Set dst (CMoveP (Binary cop (CmpP op1 zero)) (Binary dst zero))); > > this means right child of CmpP is immediate zero and right child of CmovP also is immediate zero, then an exception will occur: > >

> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x000000fff410fcc4, pid=11130, tid=11146
> #
> # JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137)
> # Java VM: OpenJDK 64-Bit Server VM (17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-loongarch64)
> # Problematic frame:
> # V [libjvm.so+0x21fcc4] cmovP_cmpP_zero_zeroNode::bottom_type() const+0x44
> #
> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please visit:
> # https://bugreport.java.com/bugreport/crash.jsp
> #
> 
> > In this case, cmovP_ cmpP_ zero_ zeroNode only has three input nodes, so an exception is triggered. This is a patch to fix this problem. Please help review it > > Thanks, > Sun Guoyun Thank you for running testing, Tobias. I have few cosmetic comments. src/hotspot/share/adlc/output_h.cpp line 1947: > 1945: // Special special hack to see if the Cmp? has been incorporated in the conditional move > 1946: MatchNode *rl = instr->_matrule->_rChild->_lChild; > 1947: if( rl && !strcmp(rl->_opType, "Binary") && rl->_rChild && strncmp(rl->_rChild->_opType, "Cmp", 3) == 0) { Code style. Use `if (rl &&`. src/hotspot/share/adlc/output_h.cpp line 1958: > 1956: } > 1957: else if( instr->_matrule && instr->_matrule->_rChild && !strcmp(instr->_matrule->_rChild->_opType,"CMoveN") ) { > 1958: int offset = 1; The following code, AFAIS, matches `CMoveP` code. The only difference is comment: `// CMoveN` vs `// CMoveP` unless I am missing something. Consider factoring code into separate function and path node's name as string. ------------- PR: https://git.openjdk.java.net/jdk/pull/5369 From kvn at openjdk.java.net Mon Oct 18 16:34:02 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 18 Oct 2021 16:34:02 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v3] In-Reply-To: References: Message-ID: <1GoVv9gDf2moQr5_TULCvRP3MT_mgre7xQ6EariqAFM=.8372f6b8-fd33-43fb-8a10-bb7c2a2fa314@github.com> On Mon, 18 Oct 2021 14:57:13 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes macro assembly routines used by fill stubs of various primitive types for X86 AVX-512 target. >> Following are the main changes:- >> 1) Specialized instruction sequence for fill operation over various block sizes. >> 2) Control flow is sensitive to AVX3Threshold and generated code operates over 32 byte vector (YMM) if >> block size is less than threshold else instructions operate of 64 byte vector (ZMM). >> 3) Bulk fill operation is performed by a destination aligned fill loop with appropriate unroll factor, this >> avoids any cache line split penalty and improves performance. >> 4) Currently fill patterns are vectorized by auto-vectorizer and generated code operates over vectors >> of MaxVectorSize, in addition auto-vectorizer is oblivious to AVX3Thresholds and this may result into >> performance degradation over prior generation of X86 servers where 64 byte vector stores using ZMM >> registers operates at reduced CPU frequency. >> Patch enables JVM runtime flag -XX:+OptimizedFill ON by default for X86 target supporting AVX-512 feature. >> 5) Patch also optimizes the mask generation sequence of fill* macro assembly routines using BZHI instruction. >> >> Performance measurements of an existing JMH micro over Icelake server shows ~1.1-4.0X gains for fill operation with varying block sizes. >> >> Following are detailed results: >> >> System Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S) >> >> Benchmark | Size | Baseline Auto-vectorized -XX:-OptimizeFill (ops/ms) | New Optimized Fill AVX3 Th=4096 (ops/ms) | Gain Factor (OptFill AVX3Th=4096/Baseline) >> -- | -- | -- | -- | -- >> ArrayFill.testByteFill | 16 | 193994.942 | 381142.844 | 1.964705059 >> ArrayFill.testByteFill | 31 | 99817.403 | 399973.74 | 4.007054161 >> ArrayFill.testByteFill | 59 | 80759.378 | 342165.394 | 4.236850289 >> ArrayFill.testByteFill | 89 | 127342.997 | 341696.357 | 2.683275603 >> ArrayFill.testByteFill | 126 | 72081.809 | 309335.351 | 4.291448221 >> ArrayFill.testByteFill | 250 | 41419.435 | 166618.264 | 4.022707311 >> ArrayFill.testByteFill | 511 | 32509.962 | 138595.951 | 4.263184036 >> ArrayFill.testByteFill | 1021 | 35930.96 | 90622.597 | 2.522131248 >> ArrayFill.testByteFill | 2047 | 32956.62 | 67252.442 | 2.040635296 >> ArrayFill.testByteFill | 4095 | 29180.81 | 45508.86 | 1.559547525 >> ArrayFill.testByteFill | 8195 | 17468.775 | 25072.671 | 1.435285016 >> ArrayFill.testByteFill | 65536 | 978.482 | 946.377 | 0.967188972 >> ArrayFill.testCharFill | 16 | 205893.99 | 381151.485 | 1.851202578 >> ArrayFill.testCharFill | 31 | 90418.278 | 385694.751 | 4.265672379 >> ArrayFill.testCharFill | 59 | 117391.45 | 310132.477 | 2.641865971 >> ArrayFill.testCharFill | 89 | 117956.135 | 202314.017 | 1.715163158 >> ArrayFill.testCharFill | 126 | 70174.917 | 164571.761 | 2.345165025 >> ArrayFill.testCharFill | 250 | 37243.255 | 141460.648 | 3.798289059 >> ArrayFill.testCharFill | 511 | 33788.369 | 98578.472 | 2.917526797 >> ArrayFill.testCharFill | 1021 | 33655.897 | 78305.288 | 2.326643916 >> ArrayFill.testCharFill | 2047 | 35656.759 | 41973.205 | 1.177145825 >> ArrayFill.testCharFill | 4095 | 16311.779 | 24724.413 | 1.515739822 >> ArrayFill.testCharFill | 8195 | 11412.845 | 12599.1 | 1.103940341 >> ArrayFill.testCharFill | 65536 | 476.138 | 486.723 | 1.02223095 >> ArrayFill.testDoubleFill | 16 | 222222.265 | 193741.026 | 0.871834449 >> ArrayFill.testDoubleFill | 31 | 169693.273 | 155377.031 | 0.915634593 >> ArrayFill.testDoubleFill | 59 | 101838.606 | 197496.671 | 1.939310432 >> ArrayFill.testDoubleFill | 89 | 106202.786 | 182813.717 | 1.721364607 >> ArrayFill.testDoubleFill | 126 | 128696.666 | 123066.432 | 0.956251905 >> ArrayFill.testDoubleFill | 250 | 81145.924 | 90895.167 | 1.120144581 >> ArrayFill.testDoubleFill | 511 | 44615.14 | 48668.332 | 1.090847905 >> ArrayFill.testDoubleFill | 1021 | 25191.332 | 25152.377 | 0.998453635 >> ArrayFill.testDoubleFill | 2047 | 11337.929 | 12655.112 | 1.11617492 >> ArrayFill.testDoubleFill | 4095 | 6378.326 | 6378.392 | 1.000010348 >> ArrayFill.testDoubleFill | 8195 | 885.269 | 882.644 | 0.9970348 >> ArrayFill.testDoubleFill | 65536 | 121.155 | 121.252 | 1.000800627 >> ArrayFill.testFloatFill | 16 | 201801.067 | 342214.071 | 1.695799116 >> ArrayFill.testFloatFill | 31 | 93851.962 | 322681.433 | 3.438195922 >> ArrayFill.testFloatFill | 59 | 107454.704 | 162266.325 | 1.510090475 >> ArrayFill.testFloatFill | 89 | 129597.511 | 158890.265 | 1.226028677 >> ArrayFill.testFloatFill | 126 | 92358.492 | 151423.881 | 1.639523099 >> ArrayFill.testFloatFill | 250 | 95412.586 | 96269.997 | 1.008986351 >> ArrayFill.testFloatFill | 511 | 68356.016 | 73395.512 | 1.07372425 >> ArrayFill.testFloatFill | 1021 | 46040.879 | 42767.414 | 0.928900901 >> ArrayFill.testFloatFill | 2047 | 23876.684 | 24988.836 | 1.046578997 >> ArrayFill.testFloatFill | 4095 | 12475.923 | 12598.467 | 1.00982244 >> ArrayFill.testFloatFill | 8195 | 6286.263 | 6292.858 | 1.001049113 >> ArrayFill.testFloatFill | 65536 | 230.041 | 248.095 | 1.078481662 >> ArrayFill.testIntFill | 16 | 188215.196 | 339491.214 | 1.803739662 >> ArrayFill.testIntFill | 31 | 146425.028 | 321621.325 | 2.19649147 >> ArrayFill.testIntFill | 59 | 140650.413 | 194907.815 | 1.385760702 >> ArrayFill.testIntFill | 89 | 78017.244 | 166579.365 | 2.13516085 >> ArrayFill.testIntFill | 126 | 97645.936 | 142150.475 | 1.455774616 >> ArrayFill.testIntFill | 250 | 68623.478 | 96538.532 | 1.406785765 >> ArrayFill.testIntFill | 511 | 57465.869 | 84218.747 | 1.465543782 >> ArrayFill.testIntFill | 1021 | 46308.298 | 45287.255 | 0.977951187 >> ArrayFill.testIntFill | 2047 | 24222.479 | 25017.366 | 1.032816088 >> ArrayFill.testIntFill | 4095 | 12470.853 | 12656.69 | 1.014901707 >> ArrayFill.testIntFill | 8195 | 6302.584 | 6312.377 | 1.001553807 >> ArrayFill.testIntFill | 65536 | 227.098 | 248.39 | 1.09375688 >> ArrayFill.testLongFill | 16 | 229400.195 | 190876.891 | 0.832069437 >> ArrayFill.testLongFill | 31 | 160433.763 | 161062.288 | 1.00391766 >> ArrayFill.testLongFill | 59 | 117527.007 | 104990.932 | 0.893334517 >> ArrayFill.testLongFill | 89 | 106400.533 | 112155.423 | 1.054087041 >> ArrayFill.testLongFill | 126 | 133428.366 | 141422.605 | 1.059914089 >> ArrayFill.testLongFill | 250 | 83393.535 | 70419.357 | 0.844422256 >> ArrayFill.testLongFill | 511 | 48534.407 | 44830.441 | 0.923683708 >> ArrayFill.testLongFill | 1021 | 25150.503 | 25144.854 | 0.999775392 >> ArrayFill.testLongFill | 2047 | 12661.581 | 12495.112 | 0.986852432 >> ArrayFill.testLongFill | 4095 | 6378.589 | 6326.361 | 0.991811982 >> ArrayFill.testLongFill | 8195 | 884.108 | 883.225 | 0.999001253 >> ArrayFill.testLongFill | 65536 | 116.544 | 115.809 | 0.993693369 >> ArrayFill.testShortFill | 16 | 181717.691 | 381160.843 | 2.097543948 >> ArrayFill.testShortFill | 31 | 99246.669 | 376006.724 | 3.788607999 >> ArrayFill.testShortFill | 59 | 125435.022 | 308756.585 | 2.461486275 >> ArrayFill.testShortFill | 89 | 116796.477 | 195568.654 | 1.674439667 >> ArrayFill.testShortFill | 126 | 37346.482 | 164389.009 | 4.401726754 >> ArrayFill.testShortFill | 250 | 32537.347 | 140808.889 | 4.327608179 >> ArrayFill.testShortFill | 511 | 43932.519 | 103200.042 | 2.349058154 >> ArrayFill.testShortFill | 1021 | 42808.585 | 80777.289 | 1.886941346 >> ArrayFill.testShortFill | 2047 | 34852.049 | 41482.517 | 1.190246146 >> ArrayFill.testShortFill | 4095 | 21427.935 | 24971.245 | 1.165359378 >> ArrayFill.testShortFill | 8195 | 11666.17 | 12655.972 | 1.084843783 >> ArrayFill.testShortFill | 65536 | 451.299 | 486.96 | 1.079018566 >> >> >> Kindly review and share your feedbak. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8275047: Aligning the main fill loops and some synthetic changes. Why improvement for testLongFill (59, 89) is much worse (not existing) than for testDoubleFill? The element size is the same. test/micro/org/openjdk/bench/java/util/ArraysFill.java line 44: > 42: public class ArraysFill { > 43: > 44: @Param({"10", "266", "2048"}) You should keep these sizes. ------------- PR: https://git.openjdk.java.net/jdk/pull/5967 From xxinliu at amazon.com Tue Oct 19 01:04:30 2021 From: xxinliu at amazon.com (Liu, Xin) Date: Mon, 18 Oct 2021 18:04:30 -0700 Subject: Does UseFPUForSpilling intend to spill a GPR to XMM? Message-ID: <84ffaa76-c3b4-423a-bd69-4ff2b62c6fc8@amazon.com> Hello, Experts, We recently encounter an ABI issue of XMM0. Even though it only happens on jdk8u windows x86(32bits) so far, it raises my concern about 'UseFPUForSpilling' for both x86 and x86_64. Does UseFPUForSpilling intend to spills GPR to XMM registers? I come from JDK-6978249, but I can't the original webrev. I don't think XMM registers are saved across function calls in any ABIs. Only XMM6-XMM15 are saved by the callee on Microsoft platforms. https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160#callercallee-saved-registers If nobody saves XMM0~3, how come C2 register allocation uses them as spilling destination? It seems possible on AMD64 as well, but it's rarer than x86 given the fact AMD64 has more GPRs. https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1388 eg. This is what I have seen on Windows x86. 07c B10: # B26 B11 <- B9 Freq: 0.999992 07c movdl XMM0, EBX # spill (xliu: EBX store an OOP) 080 MOV [ESP + #20],EDI 084 MOV EBX,[ECX + #136] ! Field: java/awt/Component.componentOrientation 08a TEST EBX,EBX 08c Je B26 P=0.000001 C=-1.000000 ---- 170 B16: # B40 B17 <- B15 B24 Freq: 0.999991 170 movdl EBX, XMM0 # spill 174 MOV EBX,[EBX + #56] ! Field: javax/swing/plaf/basic /BasicSliderUI.thumbRect (xliu: segment fault here EBX=0) 177 MOV EDI,[EBX + #16] # int ! Field: java/awt/Rectangle.width 17a NullCheck EBX Between BB10 and BB16, the control goes to convD2I_reg_reg, which calls SharedRuntime::d2l in the slow path. XMM0 is used as return value, so it is clobbered. So the FPU of 'UseFPUForSpilling' doesn't just refer to intel x87 but also include SSE/AVX units, right? If it does intend to use X/Y/ZMM registers as Spilling destination, is there any mechanism to protect them from runtime calls on x86/x86_64? In both System V and Microsoft ABIs, XMM0~3 could be used for both argument passing and return value, right? I see other runtime code stubs such as arraycopy and crc32 use XMM registers. Btw, there's a hidden mechanism somewhere which prevents c2 from working on x86_32. Even thought I has a server VM of jdk17, it only uses c1 to compile methods. Could you tell me what it is? Thanks, --lx From duke at openjdk.java.net Tue Oct 19 03:58:46 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Tue, 19 Oct 2021 03:58:46 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: On Sat, 16 Oct 2021 17:35:13 GMT, Igor Veresov wrote: >> Hi all, >> Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: >> >>

>> One or more @IR rules failed:
>> 
>> Failed IR Rules (1)
>> ------------------
>> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>>     - failOn: Graph contains forbidden nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>>         Matched forbidden node:
>>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>>     - counts: Graph contains wrong number of nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>>         Expected 1 but found 0 nodes.
>> 
>>>>> Check stdout for compilation output of the failed methods
>> 
>> >> This is a patch to fix this problem. Please help review it. >> >> Thanks, >> Sun Guoyun > > ```ensure_method_data()``` creates an MDO if it doesn't exist. I think that's what's happening. Because ```ciMethod::method_data()``` would normally do the load and it clearly doesn't happen. Thank you for @veresov reply. There are two ways to pass this use case test: 1. use `applyIf = {"TieredCompilation", "true"}` to prohibit the testing of testMethodHandleCallWithLoop() and testMethodHandleCallWithCCP(). 2. use`-XX:Tier4InvocationThreshold=300` to make sure MDO can be created. Which way is more reasonable? ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From ngasson at openjdk.java.net Tue Oct 19 04:32:47 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Tue, 19 Oct 2021 04:32:47 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Wed, 13 Oct 2021 00:00:22 GMT, Magnus Ihse Bursie wrote: > This patch expands the newly added system for hsdis backends to include LLVM. > > The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) > > Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. > > The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. The problem is `LLVMDisasmInstruction()` returns zero size as soon as it hits an instruction it doesn't understand. Something kludgy like this works: diff --git a/src/utils/hsdis/llvm/hsdis-llvm.cpp b/src/utils/hsdis/llvm/hsdis-llvm.cpp index a491082f14fa..3c50ee8e3b40 100644 --- a/src/utils/hsdis/llvm/hsdis-llvm.cpp +++ b/src/utils/hsdis/llvm/hsdis-llvm.cpp @@ -307,6 +307,10 @@ class hsdis_backend : public hsdis_backend_base { virtual size_t decode_instruction(uintptr_t p, uintptr_t start, uintptr_t end) { char buf[128]; size_t size = LLVMDisasmInstruction(_dcontext, (uint8_t*)p, (uint64_t)(end - start), (uint64_t)p, buf, sizeof(buf)); + if (size == 0 && end - start >= 4) { + snprintf(buf, sizeof(buf), "\t.word\t#0x%08x", *(uint32_t*)p); + size = 4; + } if (size > 0) { (*_printf_callback)(_printf_stream, "%s", buf); } 0x0000ffff94685454: br x8 0x0000ffff94685458: .word #0x93e3f4c0 0x0000ffff9468545c: udf #0xffff [Exception Handler] 0x0000ffff94685460: adrp x8, #-0x795000 ; {runtime_call handle_exception_from_callee Runtime1 stub} ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From ngasson at openjdk.java.net Tue Oct 19 04:50:48 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Tue, 19 Oct 2021 04:50:48 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Wed, 13 Oct 2021 00:00:22 GMT, Magnus Ihse Bursie wrote: > This patch expands the newly added system for hsdis backends to include LLVM. > > The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) > > Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. > > The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. Rather than introduce a new dependency on all of LLVM you might like to take a look at Capstone - https://www.capstone-engine.org/ . AFAIK the disassemblers are generated from the same LLVM architecture description files so the instruction coverage should be the same but the library is much more lightweight. It's packaged in most Linux distributions and there's pre-built Windows binaries available. ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From whuang at openjdk.java.net Tue Oct 19 07:01:16 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Tue, 19 Oct 2021 07:01:16 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v10] In-Reply-To: References: Message-ID: > * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. > * It may be a solver of JDK-8269866, or part of it. Wang Huang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge branch 'master' into JDK-8259948 - sync asmtest.out.h - remove useless codes - merge master & fix conflict - code refine - fix bugs - fix codes - fix codes - fix code style - fix bugs - ... and 1 more: https://git.openjdk.java.net/jdk/compare/947d52c4...92c1404a ------------- Changes: https://git.openjdk.java.net/jdk/pull/4839/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=09 Stats: 539 lines in 6 files changed: 274 ins; 71 del; 194 mod Patch: https://git.openjdk.java.net/jdk/pull/4839.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4839/head:pull/4839 PR: https://git.openjdk.java.net/jdk/pull/4839 From duke at openjdk.java.net Tue Oct 19 07:18:08 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Tue, 19 Oct 2021 07:18:08 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() [v2] In-Reply-To: References: Message-ID: > Hi all, > > When I implement a new instruct in adfile for match CMoveP with Cmp node,like this: > > match(Set dst (CMoveP (Binary cop (CmpP op1 zero)) (Binary dst zero))); > > this means right child of CmpP is immediate zero and right child of CmovP also is immediate zero, then an exception will occur: > >

> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x000000fff410fcc4, pid=11130, tid=11146
> #
> # JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137)
> # Java VM: OpenJDK 64-Bit Server VM (17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-loongarch64)
> # Problematic frame:
> # V [libjvm.so+0x21fcc4] cmovP_cmpP_zero_zeroNode::bottom_type() const+0x44
> #
> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please visit:
> # https://bugreport.java.com/bugreport/crash.jsp
> #
> 
> > In this case, cmovP_ cmpP_ zero_ zeroNode only has three input nodes, so an exception is triggered. This is a patch to fix this problem. Please help review it > > Thanks, > Sun Guoyun SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5369/files - new: https://git.openjdk.java.net/jdk/pull/5369/files/17dcc991..dca58f25 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5369&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5369&range=00-01 Stats: 21 lines in 1 file changed: 1 ins; 14 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5369.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5369/head:pull/5369 PR: https://git.openjdk.java.net/jdk/pull/5369 From duke at openjdk.java.net Tue Oct 19 07:21:51 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Tue, 19 Oct 2021 07:21:51 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() [v2] In-Reply-To: References: Message-ID: On Tue, 19 Oct 2021 07:18:08 GMT, SUN Guoyun wrote: >> Hi all, >> >> When I implement a new instruct in adfile for match CMoveP with Cmp node,like this: >> >> match(Set dst (CMoveP (Binary cop (CmpP op1 zero)) (Binary dst zero))); >> >> this means right child of CmpP is immediate zero and right child of CmovP also is immediate zero, then an exception will occur: >> >>

>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> # SIGSEGV (0xb) at pc=0x000000fff410fcc4, pid=11130, tid=11146
>> #
>> # JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137)
>> # Java VM: OpenJDK 64-Bit Server VM (17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-loongarch64)
>> # Problematic frame:
>> # V [libjvm.so+0x21fcc4] cmovP_cmpP_zero_zeroNode::bottom_type() const+0x44
>> #
>> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # If you would like to submit a bug report, please visit:
>> # https://bugreport.java.com/bugreport/crash.jsp
>> #
>> 
>> >> In this case, cmovP_ cmpP_ zero_ zeroNode only has three input nodes, so an exception is triggered. This is a patch to fix this problem. Please help review it >> >> Thanks, >> Sun Guoyun > > SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: > > 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() Please review again, thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/5369 From duke at openjdk.java.net Tue Oct 19 07:21:53 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Tue, 19 Oct 2021 07:21:53 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() [v2] In-Reply-To: <1VHepO4p3qvyDA9b_vjX8A04TvQ38Slq9Sf-O39KCHI=.63a8f738-3066-4a03-853c-2b2ede6d5e5f@github.com> References: <1VHepO4p3qvyDA9b_vjX8A04TvQ38Slq9Sf-O39KCHI=.63a8f738-3066-4a03-853c-2b2ede6d5e5f@github.com> Message-ID: On Mon, 18 Oct 2021 15:18:57 GMT, Vladimir Kozlov wrote: >> SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: >> >> 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() > > src/hotspot/share/adlc/output_h.cpp line 1947: > >> 1945: // Special special hack to see if the Cmp? has been incorporated in the conditional move >> 1946: MatchNode *rl = instr->_matrule->_rChild->_lChild; >> 1947: if( rl && !strcmp(rl->_opType, "Binary") && rl->_rChild && strncmp(rl->_rChild->_opType, "Cmp", 3) == 0) { > > Code style. Use `if (rl &&`. Done > src/hotspot/share/adlc/output_h.cpp line 1958: > >> 1956: } >> 1957: else if( instr->_matrule && instr->_matrule->_rChild && !strcmp(instr->_matrule->_rChild->_opType,"CMoveN") ) { >> 1958: int offset = 1; > > The following code, AFAIS, matches `CMoveP` code. The only difference is comment: `// CMoveN` vs `// CMoveP` unless I am missing something. > Consider factoring code into separate function and path node's name as string. Yes,you are right ------------- PR: https://git.openjdk.java.net/jdk/pull/5369 From whuang at openjdk.java.net Tue Oct 19 07:29:15 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Tue, 19 Oct 2021 07:29:15 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v11] In-Reply-To: References: Message-ID: > * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. > * It may be a solver of JDK-8269866, or part of it. Wang Huang has updated the pull request incrementally with one additional commit since the last revision: merge master ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4839/files - new: https://git.openjdk.java.net/jdk/pull/4839/files/92c1404a..a7c562e5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=09-10 Stats: 84 lines in 1 file changed: 0 ins; 1 del; 83 mod Patch: https://git.openjdk.java.net/jdk/pull/4839.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4839/head:pull/4839 PR: https://git.openjdk.java.net/jdk/pull/4839 From aoqi at openjdk.java.net Tue Oct 19 07:41:50 2021 From: aoqi at openjdk.java.net (Ao Qi) Date: Tue, 19 Oct 2021 07:41:50 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() [v2] In-Reply-To: References: <1VHepO4p3qvyDA9b_vjX8A04TvQ38Slq9Sf-O39KCHI=.63a8f738-3066-4a03-853c-2b2ede6d5e5f@github.com> Message-ID: <1uXBpABDKLPizDEXbFk19bDeKRLmbwS16bvvoU-FRxs=.eee56e44-6d59-42f2-a932-b366af179e75@github.com> On Tue, 19 Oct 2021 07:18:03 GMT, SUN Guoyun wrote: >> src/hotspot/share/adlc/output_h.cpp line 1947: >> >>> 1945: // Special special hack to see if the Cmp? has been incorporated in the conditional move >>> 1946: MatchNode *rl = instr->_matrule->_rChild->_lChild; >>> 1947: if( rl && !strcmp(rl->_opType, "Binary") && rl->_rChild && strncmp(rl->_rChild->_opType, "Cmp", 3) == 0) { >> >> Code style. Use `if (rl &&`. > > Done It's still `if( rl`, please update to `if (rl`. Copyright year also needs an update. ------------- PR: https://git.openjdk.java.net/jdk/pull/5369 From nils.eliasson at oracle.com Tue Oct 19 07:52:20 2021 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 19 Oct 2021 09:52:20 +0200 Subject: Does UseFPUForSpilling intend to spill a GPR to XMM? In-Reply-To: <84ffaa76-c3b4-423a-bd69-4ff2b62c6fc8@amazon.com> References: <84ffaa76-c3b4-423a-bd69-4ff2b62c6fc8@amazon.com> Message-ID: <290fa125-aa23-b925-9086-d0b7b3046a31@oracle.com> Hi Liu, You had a lot of questions - I'll try to answer a few of them: Yes, UseFPUForSpilling use XMM registers in the C2 compiler. On 64 bit x86, SSE2 is the minimum requirement. x87 has never been used for spilling. C2 should work fine on 32 bit x86. Have a look at "os::is_server_class_machine()" - if the machine you are running on doesn't meet some criteria - a quick-only-mode (c1) will be used. There are a flag - "NeverActAsServerClassMachine" - you can use two control this behavior. C2 handles the spilling to XMM register as a part of normal register allocation - so any clobbering should be handled. I don't recall the windows 32-bit calling convention - I need to refresh my memory on that. Can you reproduce a failure? Regards, Nils Eliasson On 2021-10-19 03:04, Liu, Xin wrote: > Hello, Experts, > > We recently encounter an ABI issue of XMM0. Even though it only happens > on jdk8u windows x86(32bits) so far, it raises my concern about > 'UseFPUForSpilling' for both x86 and x86_64. Does UseFPUForSpilling > intend to spills GPR to XMM registers? I come from JDK-6978249, but I > can't the original webrev. > > I don't think XMM registers are saved across function calls in any ABIs. > Only XMM6-XMM15 are saved by the callee on Microsoft platforms. > https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160#callercallee-saved-registers > > If nobody saves XMM0~3, how come C2 register allocation uses them as > spilling destination? It seems possible on AMD64 as well, but it's rarer > than x86 given the fact AMD64 has more GPRs. > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1388 > > eg. This is what I have seen on Windows x86. > 07c B10: # B26 B11 <- B9 Freq: 0.999992 > 07c movdl XMM0, EBX # spill (xliu: EBX store an OOP) > 080 MOV [ESP + #20],EDI > 084 MOV EBX,[ECX + #136] ! Field: > java/awt/Component.componentOrientation > 08a TEST EBX,EBX > 08c Je B26 P=0.000001 C=-1.000000 > > ---- > > 170 B16: # B40 B17 <- B15 B24 Freq: 0.999991 > 170 movdl EBX, XMM0 # spill > 174 MOV EBX,[EBX + #56] ! Field: javax/swing/plaf/basic > /BasicSliderUI.thumbRect (xliu: segment fault here EBX=0) > 177 MOV EDI,[EBX + #16] # int ! Field: java/awt/Rectangle.width > 17a NullCheck EBX > > Between BB10 and BB16, the control goes to convD2I_reg_reg, which calls > SharedRuntime::d2l in the slow path. XMM0 is used as return value, so it > is clobbered. > > So the FPU of 'UseFPUForSpilling' doesn't just refer to intel x87 but > also include SSE/AVX units, right? If it does intend to use X/Y/ZMM > registers as Spilling destination, is there any mechanism to protect > them from runtime calls on x86/x86_64? In both System V and Microsoft > ABIs, XMM0~3 could be used for both argument passing and return value, > right? > > I see other runtime code stubs such as arraycopy and crc32 use XMM > registers. > > Btw, there's a hidden mechanism somewhere which prevents c2 from working > on x86_32. Even thought I has a server VM of jdk17, it only uses c1 to > compile methods. Could you tell me what it is? > > Thanks, > --lx > From duke at openjdk.java.net Tue Oct 19 07:56:14 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Tue, 19 Oct 2021 07:56:14 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() [v3] In-Reply-To: References: Message-ID: > Hi all, > > When I implement a new instruct in adfile for match CMoveP with Cmp node,like this: > > match(Set dst (CMoveP (Binary cop (CmpP op1 zero)) (Binary dst zero))); > > this means right child of CmpP is immediate zero and right child of CmovP also is immediate zero, then an exception will occur: > >

> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x000000fff410fcc4, pid=11130, tid=11146
> #
> # JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137)
> # Java VM: OpenJDK 64-Bit Server VM (17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-loongarch64)
> # Problematic frame:
> # V [libjvm.so+0x21fcc4] cmovP_cmpP_zero_zeroNode::bottom_type() const+0x44
> #
> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please visit:
> # https://bugreport.java.com/bugreport/crash.jsp
> #
> 
> > In this case, cmovP_ cmpP_ zero_ zeroNode only has three input nodes, so an exception is triggered. This is a patch to fix this problem. Please help review it > > Thanks, > Sun Guoyun SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5369/files - new: https://git.openjdk.java.net/jdk/pull/5369/files/dca58f25..2df965e5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5369&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5369&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5369.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5369/head:pull/5369 PR: https://git.openjdk.java.net/jdk/pull/5369 From duke at openjdk.java.net Tue Oct 19 08:08:11 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Tue, 19 Oct 2021 08:08:11 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() [v4] In-Reply-To: References: Message-ID: > Hi all, > > When I implement a new instruct in adfile for match CMoveP with Cmp node,like this: > > match(Set dst (CMoveP (Binary cop (CmpP op1 zero)) (Binary dst zero))); > > this means right child of CmpP is immediate zero and right child of CmovP also is immediate zero, then an exception will occur: > >

> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x000000fff410fcc4, pid=11130, tid=11146
> #
> # JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137)
> # Java VM: OpenJDK 64-Bit Server VM (17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-loongarch64)
> # Problematic frame:
> # V [libjvm.so+0x21fcc4] cmovP_cmpP_zero_zeroNode::bottom_type() const+0x44
> #
> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please visit:
> # https://bugreport.java.com/bugreport/crash.jsp
> #
> 
> > In this case, cmovP_ cmpP_ zero_ zeroNode only has three input nodes, so an exception is triggered. This is a patch to fix this problem. Please help review it > > Thanks, > Sun Guoyun SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5369/files - new: https://git.openjdk.java.net/jdk/pull/5369/files/2df965e5..36928daa Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5369&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5369&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5369.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5369/head:pull/5369 PR: https://git.openjdk.java.net/jdk/pull/5369 From neliasso at openjdk.java.net Tue Oct 19 09:24:18 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 19 Oct 2021 09:24:18 GMT Subject: RFR: 8273277: C2: Move conditional negation into rc_predicate [v2] In-Reply-To: References: Message-ID: > Hi, > > I need some feedback on this patch. This was reported from Tencent and found in internal testing about the same time. This patch is based on a a patch provided by Tencent. > > In some very specific circumstances we need to negate the range checks that we create in PhaseIdealLoop::loop_predication_impl_helper. This is done in three places, but that method also calls insert_initial_skeleton_predicate where this isn't taken into account. > > To simplify things I have moved the negation logic into rc_predicate. This should prevent us from missing this check again. > > I do have a concern that negating the condition of the rangecheck in the skeleton predicate, since the skeleton predicate will be used as a clone template, and some rangechecks optimizations seems to assume that range checks always have LT as the condidtion. On the other hand - it is a really uncommon scenario since we haven't failed here before. > > Feedback appreciated. > > Best regards, > Nils Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: Add test case ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5987/files - new: https://git.openjdk.java.net/jdk/pull/5987/files/947769d4..21697bac Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5987&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5987&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5987.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5987/head:pull/5987 PR: https://git.openjdk.java.net/jdk/pull/5987 From njian at openjdk.java.net Tue Oct 19 10:14:59 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Tue, 19 Oct 2021 10:14:59 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v11] In-Reply-To: References: Message-ID: On Tue, 19 Oct 2021 07:29:15 GMT, Wang Huang wrote: >> * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. >> * It may be a solver of JDK-8269866, or part of it. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > merge master Looks good to me. Just some style nits. src/hotspot/cpu/aarch64/aarch64_neon.ad line 330: > 328: match(Set dst (VectorCastI2X src)); > 329: format %{ "xtn $dst, T4H, $src, T4S\n\t" > 330: "xtn $dst, T8B, $dst, T8H\t# convert 4I to 4B vector" nit: remove one space between "T4H, $src". src/hotspot/cpu/aarch64/aarch64_neon.ad line 353: > 351: %} > 352: instruct vcvt2Lto2F(vecD dst, vecX src) > 353: %{ nit: one blank line between each generated rule. src/hotspot/cpu/aarch64/aarch64_neon.ad line 427: > 425: %} > 426: instruct vcvt4Bto4F(vecX dst, vecD src) > 427: %{ Add blank line between L425 and L426 src/hotspot/cpu/aarch64/aarch64_neon.ad line 517: > 515: %} > 516: instruct vcvt4Fto4B(vecD dst, vecX src) > 517: %{ And here ------------- Marked as reviewed by njian (Committer). PR: https://git.openjdk.java.net/jdk/pull/4839 From roland at openjdk.java.net Tue Oct 19 14:22:20 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 19 Oct 2021 14:22:20 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v8] In-Reply-To: References: Message-ID: <-s6i8qjWuNEyXemNFAw7EMvN42FOvoi6-KkRefxo96k=.3bff40dd-71bd-46fc-a034-951491887bc6@github.com> > JDK-8255150 makes it possible for java code to explicitly perform a > range check on long values. JDK-8223051 provides a transformation of > long counted loops into loop nests with an inner int counted > loop. With this change I propose transforming long range checks that > operate on the iv of a long counted loop into range checks that > operate on the iv of the int inner loop once it has been > created. Existing range check eliminations can then kick in. > > Transformation of range checks is piggy backed on the loop nest > creation for 2 reasons: > > - pattern matching range checks is easier right before the loop nest > is created > > - the number of iterations of the inner loop is adjusted so scale * > inner_iv doesn't overflow > > C2 has logic to delay some split if transformations so they don't > break the scale * iv + offset pattern. I reused that logic for long > range checks and had to relax what's considered a range check because > initially a range check from Object.checkIndex() may include a test > for range > 0 that needs a round of loop opts to be hoisted. I realize > there's some code duplication but I didn't see a way to share logic > between IdealLoopTree::may_have_range_check() > IdealLoopTree::policy_range_check() that would feel right. > > I realize the comment in PhaseIdealLoop::transform_long_range_checks() > is scary. FWIW, it's not as complicated as it looks. I found drawing > the range covered by the entire long loop and the range covered by the > inner loop help see how range checks can be transformed. Then the > comment helps make sure all cases are covered and verify the generated > code actually covers all of them. > > One issue is overflow. I think the fact that inner_iv * scale doesn't > overflow helps simplify thing. One possible overflow is that of scale > * upper + offset which is handled by forcing all range checks in that > case to deoptimize. I don't think other case of overflow needs special > handling. > > This was tested with a Memory Segment micro benchmark (and patched > Memory Segment support to take advantage of the new checkIndex > intrinsic, both provided by Maurizio). Range checks in the micro > benchmark are properly optimized (and performance increases > significantly). Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - John reworked comment - Merge branch 'master' into JDK-8259609 - merge - Merge branch 'master' into JDK-8259609 - whitespace - rework - Merge branch 'master' into JDK-8259609 - John's review 1 - merge with master - Tobias' comments - ... and 8 more: https://git.openjdk.java.net/jdk/compare/947d52c4...1f746a77 ------------- Changes: https://git.openjdk.java.net/jdk/pull/2045/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2045&range=07 Stats: 908 lines in 12 files changed: 733 ins; 67 del; 108 mod Patch: https://git.openjdk.java.net/jdk/pull/2045.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2045/head:pull/2045 PR: https://git.openjdk.java.net/jdk/pull/2045 From roland at openjdk.java.net Tue Oct 19 15:13:21 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 19 Oct 2021 15:13:21 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v9] In-Reply-To: References: Message-ID: > JDK-8255150 makes it possible for java code to explicitly perform a > range check on long values. JDK-8223051 provides a transformation of > long counted loops into loop nests with an inner int counted > loop. With this change I propose transforming long range checks that > operate on the iv of a long counted loop into range checks that > operate on the iv of the int inner loop once it has been > created. Existing range check eliminations can then kick in. > > Transformation of range checks is piggy backed on the loop nest > creation for 2 reasons: > > - pattern matching range checks is easier right before the loop nest > is created > > - the number of iterations of the inner loop is adjusted so scale * > inner_iv doesn't overflow > > C2 has logic to delay some split if transformations so they don't > break the scale * iv + offset pattern. I reused that logic for long > range checks and had to relax what's considered a range check because > initially a range check from Object.checkIndex() may include a test > for range > 0 that needs a round of loop opts to be hoisted. I realize > there's some code duplication but I didn't see a way to share logic > between IdealLoopTree::may_have_range_check() > IdealLoopTree::policy_range_check() that would feel right. > > I realize the comment in PhaseIdealLoop::transform_long_range_checks() > is scary. FWIW, it's not as complicated as it looks. I found drawing > the range covered by the entire long loop and the range covered by the > inner loop help see how range checks can be transformed. Then the > comment helps make sure all cases are covered and verify the generated > code actually covers all of them. > > One issue is overflow. I think the fact that inner_iv * scale doesn't > overflow helps simplify thing. One possible overflow is that of scale > * upper + offset which is handled by forcing all range checks in that > case to deoptimize. I don't think other case of overflow needs special > handling. > > This was tested with a Memory Segment micro benchmark (and patched > Memory Segment support to take advantage of the new checkIndex > intrinsic, both provided by Maurizio). Range checks in the micro > benchmark are properly optimized (and performance increases > significantly). Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: comment fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2045/files - new: https://git.openjdk.java.net/jdk/pull/2045/files/1f746a77..86b73147 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2045&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2045&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2045.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2045/head:pull/2045 PR: https://git.openjdk.java.net/jdk/pull/2045 From kvn at openjdk.java.net Tue Oct 19 16:06:58 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 19 Oct 2021 16:06:58 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() [v4] In-Reply-To: References: Message-ID: On Tue, 19 Oct 2021 08:08:11 GMT, SUN Guoyun wrote: >> Hi all, >> >> When I implement a new instruct in adfile for match CMoveP with Cmp node,like this: >> >> match(Set dst (CMoveP (Binary cop (CmpP op1 zero)) (Binary dst zero))); >> >> this means right child of CmpP is immediate zero and right child of CmovP also is immediate zero, then an exception will occur: >> >>

>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> # SIGSEGV (0xb) at pc=0x000000fff410fcc4, pid=11130, tid=11146
>> #
>> # JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137)
>> # Java VM: OpenJDK 64-Bit Server VM (17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-loongarch64)
>> # Problematic frame:
>> # V [libjvm.so+0x21fcc4] cmovP_cmpP_zero_zeroNode::bottom_type() const+0x44
>> #
>> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # If you would like to submit a bug report, please visit:
>> # https://bugreport.java.com/bugreport/crash.jsp
>> #
>> 
>> >> In this case, cmovP_ cmpP_ zero_ zeroNode only has three input nodes, so an exception is triggered. This is a patch to fix this problem. Please help review it >> >> Thanks, >> Sun Guoyun > > SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: > > 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5369 From duke at openjdk.java.net Tue Oct 19 20:34:55 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Tue, 19 Oct 2021 20:34:55 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2] In-Reply-To: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: > Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: refactoring to remove code duplication by using a common routine for UMulHiLNode and MulHiLNode ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5933/files - new: https://git.openjdk.java.net/jdk/pull/5933/files/cb30b268..a10a9fbe Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5933&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5933&range=00-01 Stats: 25 lines in 2 files changed: 12 ins; 12 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5933.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5933/head:pull/5933 PR: https://git.openjdk.java.net/jdk/pull/5933 From duke at openjdk.java.net Tue Oct 19 20:35:03 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Tue, 19 Oct 2021 20:35:03 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2] In-Reply-To: References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> <8MuiklM5Nt3VkzyVHbWqwMh_LkVvVY2Mf65_0zTx4Kw=.9351008f-b489-4103-be9d-87e6fc4a8f39@github.com> Message-ID: <5AqWGf-saY2vk4Z14pSF25waVdcU0j06jW3aSUH76dg=.bba122ee-cf23-44be-bdbc-8bf54b4a6b71@github.com> On Fri, 15 Oct 2021 21:04:12 GMT, Vamsi Parasa wrote: > > > How you verified correctness of results? I suggest to extend `test/jdk//java/lang/Math/MultiplicationTests.java` test to cover unsigned method. > > > > > > Tests for unsignedMultiplyHigh were already added in test/jdk//java/lang/Math/MultiplicationTests.java (in July 2021 by Brian Burkhalter). Used that test to verify the correctness of the results. > > Good. It seems I have old version of the test. Did you run it with -Xcomp? How you verified that intrinsic is used? I have verified that the intrinsic is being used by looking at the x86 assembly code generated by using perfasm profiler. ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From duke at openjdk.java.net Tue Oct 19 20:35:17 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Tue, 19 Oct 2021 20:35:17 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2] In-Reply-To: References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Fri, 15 Oct 2021 19:31:24 GMT, Vamsi Parasa wrote: >> src/hotspot/share/opto/mulnode.cpp line 468: >> >>> 466: } >>> 467: >>> 468: //============================================================================= >> >> MulHiLNode::Value() and UMulHiLNode::Value() seem to be identical. Perhaps some refactoring would be in order, maybe make a common shared routine. > > Sure, will do the refactoring to use a shared routine. Pushed the refactored code to use a common routine for MulHiLNode::Value() and UMulHiLNode::Value(). Please review... ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From duke at openjdk.java.net Tue Oct 19 20:35:26 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Tue, 19 Oct 2021 20:35:26 GMT Subject: Withdrawn: 8275167: x86 intrinsic for unsignedMultiplyHigh In-Reply-To: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: <0g7B5EoV8DdxjVj7jQpJJD7aAjvqbr_M0rl2GyA9ing=.0a6e9bdb-7eab-4fbf-911a-6db783b6882b@github.com> On Wed, 13 Oct 2021 18:55:10 GMT, Vamsi Parasa wrote: > Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From duke at openjdk.java.net Wed Oct 20 01:01:05 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Wed, 20 Oct 2021 01:01:05 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() In-Reply-To: References: Message-ID: On Fri, 15 Oct 2021 18:55:09 GMT, Vladimir Kozlov wrote: >> Hi all, >> >> When I implement a new instruct in adfile for match CMoveP with Cmp node,like this: >> >> match(Set dst (CMoveP (Binary cop (CmpP op1 zero)) (Binary dst zero))); >> >> this means right child of CmpP is immediate zero and right child of CmovP also is immediate zero, then an exception will occur: >> >>

>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> # SIGSEGV (0xb) at pc=0x000000fff410fcc4, pid=11130, tid=11146
>> #
>> # JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137)
>> # Java VM: OpenJDK 64-Bit Server VM (17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-loongarch64)
>> # Problematic frame:
>> # V [libjvm.so+0x21fcc4] cmovP_cmpP_zero_zeroNode::bottom_type() const+0x44
>> #
>> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # If you would like to submit a bug report, please visit:
>> # https://bugreport.java.com/bugreport/crash.jsp
>> #
>> 
>> >> In this case, cmovP_ cmpP_ zero_ zeroNode only has three input nodes, so an exception is triggered. This is a patch to fix this problem. Please help review it >> >> Thanks, >> Sun Guoyun > > Someone in Oracle have to run this patch through testing to make sure it passed. Changes affect shared code. @vnkozlov @TobiHartmann Could you please sponsor it for me? ------------- PR: https://git.openjdk.java.net/jdk/pull/5369 From jrose at openjdk.java.net Wed Oct 20 03:23:07 2021 From: jrose at openjdk.java.net (John R Rose) Date: Wed, 20 Oct 2021 03:23:07 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v9] In-Reply-To: References: Message-ID: On Tue, 19 Oct 2021 15:13:21 GMT, Roland Westrelin wrote: >> JDK-8255150 makes it possible for java code to explicitly perform a >> range check on long values. JDK-8223051 provides a transformation of >> long counted loops into loop nests with an inner int counted >> loop. With this change I propose transforming long range checks that >> operate on the iv of a long counted loop into range checks that >> operate on the iv of the int inner loop once it has been >> created. Existing range check eliminations can then kick in. >> >> Transformation of range checks is piggy backed on the loop nest >> creation for 2 reasons: >> >> - pattern matching range checks is easier right before the loop nest >> is created >> >> - the number of iterations of the inner loop is adjusted so scale * >> inner_iv doesn't overflow >> >> C2 has logic to delay some split if transformations so they don't >> break the scale * iv + offset pattern. I reused that logic for long >> range checks and had to relax what's considered a range check because >> initially a range check from Object.checkIndex() may include a test >> for range > 0 that needs a round of loop opts to be hoisted. I realize >> there's some code duplication but I didn't see a way to share logic >> between IdealLoopTree::may_have_range_check() >> IdealLoopTree::policy_range_check() that would feel right. >> >> I realize the comment in PhaseIdealLoop::transform_long_range_checks() >> is scary. FWIW, it's not as complicated as it looks. I found drawing >> the range covered by the entire long loop and the range covered by the >> inner loop help see how range checks can be transformed. Then the >> comment helps make sure all cases are covered and verify the generated >> code actually covers all of them. >> >> One issue is overflow. I think the fact that inner_iv * scale doesn't >> overflow helps simplify thing. One possible overflow is that of scale >> * upper + offset which is handled by forcing all range checks in that >> case to deoptimize. I don't think other case of overflow needs special >> handling. >> >> This was tested with a Memory Segment micro benchmark (and patched >> Memory Segment support to take advantage of the new checkIndex >> intrinsic, both provided by Maurizio). Range checks in the micro >> benchmark are properly optimized (and performance increases >> significantly). > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > comment fix No more comments from me. Ship it! ------------- Marked as reviewed by jrose (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2045 From xxinliu at amazon.com Wed Oct 20 03:38:59 2021 From: xxinliu at amazon.com (Liu, Xin) Date: Tue, 19 Oct 2021 20:38:59 -0700 Subject: [UNVERIFIED SENDER] Re: Does UseFPUForSpilling intend to spill a GPR to XMM? In-Reply-To: <290fa125-aa23-b925-9086-d0b7b3046a31@oracle.com> References: <84ffaa76-c3b4-423a-bd69-4ff2b62c6fc8@amazon.com> <290fa125-aa23-b925-9086-d0b7b3046a31@oracle.com> Message-ID: hi, Nils, Thanks for explanation. Sorry I have too many questions. let me focus on XMM registers. Yes, we have a reproducible from customer, but it's a GUI application. I am still trying to reduce it to a single test. About the the crash, I've filed a JBS issue JDK-8275565 with a description why XMM0 may be clobbered on x86_32. I take a closer look at push_FPU_state() today. void MacroAssembler::push_FPU_state() { subptr(rsp, FPUStateSizeInWords * wordSize); #ifndef _LP64 fnsave(Address(rsp, 0)); fwait(); #else fxsave(Address(rsp, 0)); #endif // LP64 } On x86 system, it doesn't save XMM registers because of fnsave. I see that you save them in RegisterSaver::save_live_registers() using extra steps. However, generate_d2i_wrapper() only uses push/pop_FPU_state(). Am I right here? thanks, --lx On 10/19/21 12:52 AM, Nils Eliasson wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > Hi Liu, > > You had a lot of questions - I'll try to answer a few of them: > > Yes, UseFPUForSpilling use XMM registers in the C2 compiler. On 64 bit > x86, SSE2 is the minimum requirement. x87 has never been used for spilling. > > C2 should work fine on 32 bit x86. Have a look at > "os::is_server_class_machine()" - if the machine you are running on > doesn't meet some criteria - a quick-only-mode (c1) will be used. There > are a flag - "NeverActAsServerClassMachine" - you can use two control > this behavior. > > C2 handles the spilling to XMM register as a part of normal register > allocation - so any clobbering should be handled. I don't recall the > windows 32-bit calling convention - I need to refresh my memory on that. > Can you reproduce a failure? > > Regards, > Nils Eliasson > > > On 2021-10-19 03:04, Liu, Xin wrote: >> Hello, Experts, >> >> We recently encounter an ABI issue of XMM0. Even though it only happens >> on jdk8u windows x86(32bits) so far, it raises my concern about >> 'UseFPUForSpilling' for both x86 and x86_64. Does UseFPUForSpilling >> intend to spills GPR to XMM registers? I come from JDK-6978249, but I >> can't the original webrev. >> >> I don't think XMM registers are saved across function calls in any ABIs. >> Only XMM6-XMM15 are saved by the callee on Microsoft platforms. >> https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160#callercallee-saved-registers >> >> If nobody saves XMM0~3, how come C2 register allocation uses them as >> spilling destination? It seems possible on AMD64 as well, but it's rarer >> than x86 given the fact AMD64 has more GPRs. >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1388 >> >> eg. This is what I have seen on Windows x86. >> 07c B10: # B26 B11 <- B9 Freq: 0.999992 >> 07c movdl XMM0, EBX # spill (xliu: EBX store an OOP) >> 080 MOV [ESP + #20],EDI >> 084 MOV EBX,[ECX + #136] ! Field: >> java/awt/Component.componentOrientation >> 08a TEST EBX,EBX >> 08c Je B26 P=0.000001 C=-1.000000 >> >> ---- >> >> 170 B16: # B40 B17 <- B15 B24 Freq: 0.999991 >> 170 movdl EBX, XMM0 # spill >> 174 MOV EBX,[EBX + #56] ! Field: javax/swing/plaf/basic >> /BasicSliderUI.thumbRect (xliu: segment fault here EBX=0) >> 177 MOV EDI,[EBX + #16] # int ! Field: java/awt/Rectangle.width >> 17a NullCheck EBX >> >> Between BB10 and BB16, the control goes to convD2I_reg_reg, which calls >> SharedRuntime::d2l in the slow path. XMM0 is used as return value, so it >> is clobbered. >> >> So the FPU of 'UseFPUForSpilling' doesn't just refer to intel x87 but >> also include SSE/AVX units, right? If it does intend to use X/Y/ZMM >> registers as Spilling destination, is there any mechanism to protect >> them from runtime calls on x86/x86_64? In both System V and Microsoft >> ABIs, XMM0~3 could be used for both argument passing and return value, >> right? >> >> I see other runtime code stubs such as arraycopy and crc32 use XMM >> registers. >> >> Btw, there's a hidden mechanism somewhere which prevents c2 from working >> on x86_32. Even thought I has a server VM of jdk17, it only uses c1 to >> compile methods. Could you tell me what it is? >> >> Thanks, >> --lx >> > From thartmann at openjdk.java.net Wed Oct 20 06:37:09 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 20 Oct 2021 06:37:09 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() [v4] In-Reply-To: References: Message-ID: On Tue, 19 Oct 2021 08:08:11 GMT, SUN Guoyun wrote: >> Hi all, >> >> When I implement a new instruct in adfile for match CMoveP with Cmp node,like this: >> >> match(Set dst (CMoveP (Binary cop (CmpP op1 zero)) (Binary dst zero))); >> >> this means right child of CmpP is immediate zero and right child of CmovP also is immediate zero, then an exception will occur: >> >>

>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> # SIGSEGV (0xb) at pc=0x000000fff410fcc4, pid=11130, tid=11146
>> #
>> # JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137)
>> # Java VM: OpenJDK 64-Bit Server VM (17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-loongarch64)
>> # Problematic frame:
>> # V [libjvm.so+0x21fcc4] cmovP_cmpP_zero_zeroNode::bottom_type() const+0x44
>> #
>> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # If you would like to submit a bug report, please visit:
>> # https://bugreport.java.com/bugreport/crash.jsp
>> #
>> 
>> >> In this case, cmovP_ cmpP_ zero_ zeroNode only has three input nodes, so an exception is triggered. This is a patch to fix this problem. Please help review it >> >> Thanks, >> Sun Guoyun > > SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: > > 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5369 From duke at openjdk.java.net Wed Oct 20 06:41:11 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Wed, 20 Oct 2021 06:41:11 GMT Subject: Integrated: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() In-Reply-To: References: Message-ID: <7ucZtySpUAHmdp048pN9FyCTAGfonDd6Zn-9VPPnDZc=.7982a266-dd04-436e-9e4c-e36d7be31781@github.com> On Sat, 4 Sep 2021 02:58:56 GMT, SUN Guoyun wrote: > Hi all, > > When I implement a new instruct in adfile for match CMoveP with Cmp node,like this: > > match(Set dst (CMoveP (Binary cop (CmpP op1 zero)) (Binary dst zero))); > > this means right child of CmpP is immediate zero and right child of CmovP also is immediate zero, then an exception will occur: > >

> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x000000fff410fcc4, pid=11130, tid=11146
> #
> # JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137)
> # Java VM: OpenJDK 64-Bit Server VM (17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-loongarch64)
> # Problematic frame:
> # V [libjvm.so+0x21fcc4] cmovP_cmpP_zero_zeroNode::bottom_type() const+0x44
> #
> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please visit:
> # https://bugreport.java.com/bugreport/crash.jsp
> #
> 
> > In this case, cmovP_ cmpP_ zero_ zeroNode only has three input nodes, so an exception is triggered. This is a patch to fix this problem. Please help review it > > Thanks, > Sun Guoyun This pull request has now been integrated. Changeset: bd0bed71 Author: sunguoyun Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/bd0bed71e55f0bb8b4619495c79184f94c0701fb Stats: 23 lines in 1 file changed: 1 ins; 12 del; 10 mod 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5369 From nils.eliasson at oracle.com Wed Oct 20 08:48:27 2021 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 20 Oct 2021 10:48:27 +0200 Subject: [EXTERNAL] [UNVERIFIED SENDER] Re: Does UseFPUForSpilling intend to spill a GPR to XMM? In-Reply-To: References: <84ffaa76-c3b4-423a-bd69-4ff2b62c6fc8@amazon.com> <290fa125-aa23-b925-9086-d0b7b3046a31@oracle.com> Message-ID: <678a8788-06db-cf6a-7a58-e161cdd6c081@oracle.com> Hi Xin, On 2021-10-20 05:38, Liu, Xin wrote: > hi, Nils, > > Thanks for explanation. Sorry I have too many questions. let me focus on > XMM registers. Yes, we have a reproducible from customer, but it's a GUI > application. I am still trying to reduce it to a single test. > > About the the crash, I've filed a JBS issue JDK-8275565 with a > description why XMM0 may be clobbered on x86_32. I take a closer look at > push_FPU_state() today. > > void MacroAssembler::push_FPU_state() { > subptr(rsp, FPUStateSizeInWords * wordSize); > #ifndef _LP64 > fnsave(Address(rsp, 0)); > fwait(); > #else > fxsave(Address(rsp, 0)); > #endif // LP64 > } > > On x86 system, it doesn't save XMM registers because of fnsave. I see > that you save them in RegisterSaver::save_live_registers() using extra > steps. However, generate_d2i_wrapper() only uses push/pop_FPU_state(). > Am I right here? It looks like that's the case - yes. 32 bit x86 isn't an active platform from my perspective - so I don't know if this is an old or a new problem. Regards, Nils Eliasson > > > thanks, > --lx > > > > On 10/19/21 12:52 AM, Nils Eliasson wrote: >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. >> >> >> >> Hi Liu, >> >> You had a lot of questions - I'll try to answer a few of them: >> >> Yes, UseFPUForSpilling use XMM registers in the C2 compiler. On 64 bit >> x86, SSE2 is the minimum requirement. x87 has never been used for spilling. >> >> C2 should work fine on 32 bit x86. Have a look at >> "os::is_server_class_machine()" - if the machine you are running on >> doesn't meet some criteria - a quick-only-mode (c1) will be used. There >> are a flag - "NeverActAsServerClassMachine" - you can use two control >> this behavior. >> >> C2 handles the spilling to XMM register as a part of normal register >> allocation - so any clobbering should be handled. I don't recall the >> windows 32-bit calling convention - I need to refresh my memory on that. >> Can you reproduce a failure? >> >> Regards, >> Nils Eliasson >> >> >> On 2021-10-19 03:04, Liu, Xin wrote: >>> Hello, Experts, >>> >>> We recently encounter an ABI issue of XMM0. Even though it only happens >>> on jdk8u windows x86(32bits) so far, it raises my concern about >>> 'UseFPUForSpilling' for both x86 and x86_64. Does UseFPUForSpilling >>> intend to spills GPR to XMM registers? I come from JDK-6978249, but I >>> can't the original webrev. >>> >>> I don't think XMM registers are saved across function calls in any ABIs. >>> Only XMM6-XMM15 are saved by the callee on Microsoft platforms. >>> https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160#callercallee-saved-registers >>> >>> If nobody saves XMM0~3, how come C2 register allocation uses them as >>> spilling destination? It seems possible on AMD64 as well, but it's rarer >>> than x86 given the fact AMD64 has more GPRs. >>> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1388 >>> >>> eg. This is what I have seen on Windows x86. >>> 07c B10: # B26 B11 <- B9 Freq: 0.999992 >>> 07c movdl XMM0, EBX # spill (xliu: EBX store an OOP) >>> 080 MOV [ESP + #20],EDI >>> 084 MOV EBX,[ECX + #136] ! Field: >>> java/awt/Component.componentOrientation >>> 08a TEST EBX,EBX >>> 08c Je B26 P=0.000001 C=-1.000000 >>> >>> ---- >>> >>> 170 B16: # B40 B17 <- B15 B24 Freq: 0.999991 >>> 170 movdl EBX, XMM0 # spill >>> 174 MOV EBX,[EBX + #56] ! Field: javax/swing/plaf/basic >>> /BasicSliderUI.thumbRect (xliu: segment fault here EBX=0) >>> 177 MOV EDI,[EBX + #16] # int ! Field: java/awt/Rectangle.width >>> 17a NullCheck EBX >>> >>> Between BB10 and BB16, the control goes to convD2I_reg_reg, which calls >>> SharedRuntime::d2l in the slow path. XMM0 is used as return value, so it >>> is clobbered. >>> >>> So the FPU of 'UseFPUForSpilling' doesn't just refer to intel x87 but >>> also include SSE/AVX units, right? If it does intend to use X/Y/ZMM >>> registers as Spilling destination, is there any mechanism to protect >>> them from runtime calls on x86/x86_64? In both System V and Microsoft >>> ABIs, XMM0~3 could be used for both argument passing and return value, >>> right? >>> >>> I see other runtime code stubs such as arraycopy and crc32 use XMM >>> registers. >>> >>> Btw, there's a hidden mechanism somewhere which prevents c2 from working >>> on x86_32. Even thought I has a server VM of jdk17, it only uses c1 to >>> compile methods. Could you tell me what it is? >>> >>> Thanks, >>> --lx >>> From nils.eliasson at oracle.com Wed Oct 20 09:30:59 2021 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 20 Oct 2021 11:30:59 +0200 Subject: [EXTERNAL] [UNVERIFIED SENDER] Re: Does UseFPUForSpilling intend to spill a GPR to XMM? In-Reply-To: <678a8788-06db-cf6a-7a58-e161cdd6c081@oracle.com> References: <84ffaa76-c3b4-423a-bd69-4ff2b62c6fc8@amazon.com> <290fa125-aa23-b925-9086-d0b7b3046a31@oracle.com> <678a8788-06db-cf6a-7a58-e161cdd6c081@oracle.com> Message-ID: On 2021-10-20 10:48, Nils Eliasson wrote: > Hi Xin, > > On 2021-10-20 05:38, Liu, Xin wrote: >> hi, Nils, >> >> Thanks for explanation. Sorry I have too many questions. let me focus on >> XMM registers. Yes, we have a reproducible from customer, but it's a GUI >> application. I am still trying to reduce it to a single test. >> >> About the the crash, I've filed a JBS issue JDK-8275565 with a >> description why XMM0 may be clobbered on x86_32. I take a closer look at >> push_FPU_state() today. >> >> void MacroAssembler::push_FPU_state() { >> ?? subptr(rsp, FPUStateSizeInWords * wordSize); >> #ifndef _LP64 >> ?? fnsave(Address(rsp, 0)); >> ?? fwait(); >> #else >> ?? fxsave(Address(rsp, 0)); >> #endif // LP64 >> } >> >> On x86 system, it doesn't save XMM registers because of fnsave. I see >> that you save them in RegisterSaver::save_live_registers() using extra >> steps. However, generate_d2i_wrapper() only uses push/pop_FPU_state(). >> Am I right here? The generate_d2i_wrapper is only used on x86, and only for d2i an d2l. They are called as "leafs" - so there will be no safepoint. In that case xmm registers can only be clobbered if d2i or d2l is using xmm registers. They are native functions - so they might have been compiled with different compilers for each release. I suggest you disassemble them and look for xmm usage. Regards, Nils > It looks like that's the case - yes. > > 32 bit x86 isn't an active platform from my perspective - so I don't > know if this is an old or a new problem. > > Regards, > Nils Eliasson > > >> >> >> thanks, >> --lx >> >> >> >> On 10/19/21 12:52 AM, Nils Eliasson wrote: >>> CAUTION: This email originated from outside of the organization. Do >>> not click links or open attachments unless you can confirm the >>> sender and know the content is safe. >>> >>> >>> >>> Hi Liu, >>> >>> You had a lot of questions - I'll try to answer a few of them: >>> >>> Yes, UseFPUForSpilling use XMM registers in the C2 compiler. On 64 bit >>> x86, SSE2 is the minimum requirement. x87 has never been used for >>> spilling. >>> >>> C2 should work fine on 32 bit x86. Have a look at >>> "os::is_server_class_machine()" - if the machine you are running on >>> doesn't meet some criteria - a quick-only-mode (c1) will be used. There >>> are a flag - "NeverActAsServerClassMachine" - you can use two control >>> this behavior. >>> >>> C2 handles the spilling to XMM register as a part of normal register >>> allocation - so any clobbering should be handled. I don't recall the >>> windows 32-bit calling convention - I need to refresh my memory on >>> that. >>> Can you reproduce a failure? >>> >>> Regards, >>> Nils Eliasson >>> >>> >>> On 2021-10-19 03:04, Liu, Xin wrote: >>>> Hello, Experts, >>>> >>>> We recently encounter an ABI issue of XMM0. Even though it only >>>> happens >>>> on jdk8u windows x86(32bits) so far, it raises my concern about >>>> 'UseFPUForSpilling' for both x86 and x86_64. Does UseFPUForSpilling >>>> intend to spills GPR to XMM registers? I come from JDK-6978249, but I >>>> can't the original webrev. >>>> >>>> I don't think XMM registers are saved across function calls in any >>>> ABIs. >>>> Only XMM6-XMM15 are saved by the callee on Microsoft platforms. >>>> https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160#callercallee-saved-registers >>>> >>>> >>>> If nobody saves XMM0~3, how come C2 register allocation uses them as >>>> spilling destination? It seems possible on AMD64 as well, but it's >>>> rarer >>>> than x86 given the fact AMD64 has more GPRs. >>>> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1388 >>>> >>>> >>>> eg. This is what I have seen on Windows x86. >>>> 07c?? B10: #? B26 B11 <- B9? Freq: 0.999992 >>>> 07c?????????? movdl?? XMM0, EBX?????? # spill (xliu: EBX store an OOP) >>>> 080?????????? MOV??? [ESP + #20],EDI >>>> 084?????????? MOV??? EBX,[ECX + #136] ! Field: >>>> java/awt/Component.componentOrientation >>>> 08a?????????? TEST?? EBX,EBX >>>> 08c?????????? Je??? B26? P=0.000001 C=-1.000000 >>>> >>>> ---- >>>> >>>> 170?? B16: #? B40 B17 <- B15 B24? Freq: 0.999991 >>>> 170?????????? movdl?? EBX, XMM0?????? # spill >>>> 174?????????? MOV??? EBX,[EBX + #56] ! Field: javax/swing/plaf/basic >>>> /BasicSliderUI.thumbRect (xliu: segment fault here EBX=0) >>>> 177?????????? MOV??? EDI,[EBX + #16]? # int ! Field: >>>> java/awt/Rectangle.width >>>> 17a?????????? NullCheck EBX >>>> >>>> Between BB10 and BB16, the control goes to convD2I_reg_reg, which >>>> calls >>>> SharedRuntime::d2l in the slow path. XMM0 is used as return value, >>>> so it >>>> is clobbered. >>>> >>>> So the FPU of 'UseFPUForSpilling' doesn't just refer to intel x87 but >>>> also include SSE/AVX units, right? If it does intend to use X/Y/ZMM >>>> registers as Spilling destination, is there any mechanism to protect >>>> them from runtime calls on x86/x86_64? In both System V and Microsoft >>>> ABIs, XMM0~3 could be used for both argument passing and return value, >>>> right? >>>> >>>> I see other runtime code stubs such as arraycopy and crc32 use XMM >>>> registers. >>>> >>>> Btw, there's a hidden mechanism somewhere which prevents c2 from >>>> working >>>> on x86_32. Even thought I has a server VM of jdk17, it only uses c1 to >>>> compile methods. Could you tell me what it is? >>>> >>>> Thanks, >>>> --lx >>>> > From ihse at openjdk.java.net Wed Oct 20 10:45:03 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Wed, 20 Oct 2021 10:45:03 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: References: Message-ID: On Tue, 19 Oct 2021 04:47:52 GMT, Nick Gasson wrote: >> This patch expands the newly added system for hsdis backends to include LLVM. >> >> The actual code in hsdis-llvm.cpp is based heavily on the work by @luhenry, as published in the never integrated PR https://github.com/openjdk/jdk/pull/392. (I have basically just ripped out the binutils-based part of it.) >> >> Unfortunately I have not been able to make this work properly on Windows. With some additional flags I made it compile without complaints, but it caused hotspot to segfault in `LoadLibrary` (!) in `os::dll_load` when I tried to load the library. This is somewhat ironic, since the initial implementation was created by Ludovic for the very purpose of using it on Windows. >> >> The lack of Windows support in this patch does not mean it is impossible to get it to work, just that I need to co-operate with someone who has more experience of compiling LLVM on Windows, and/or are more eager to get this combination to work. > > Rather than introduce a new dependency on all of LLVM you might like to take a look at Capstone - https://www.capstone-engine.org/ . AFAIK the disassemblers are generated from the same LLVM architecture description files so the instruction coverage should be the same but the library is much more lightweight. It's packaged in most Linux distributions and there's pre-built Windows binaries available. @nick-arm That'd introduce a new dependency to Capstone. ;-) But your suggestion is excellent -- in fact, I have a branch in my personal fork that builds hsdis with Capstone as backend. I just scheduled for myself to submit this PR first. (Which maybe was a mistake; it was obviously more tricky to get right than I anticipated.) I might reconsider that choice and let this PR wait until I've pushed the Capstone backend first, instead. ------------- PR: https://git.openjdk.java.net/jdk/pull/5920 From chagedorn at openjdk.java.net Wed Oct 20 11:27:26 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 20 Oct 2021 11:27:26 GMT Subject: RFR: 8275104: IR framework does not handle client VM builds correctly Message-ID: While the IR framework is primarily used for C2 IR verification, it should also work with client VM builds. There are currently some problems which are fixed with this patch: - The IR framework currently only bails out of IR matching if C2 is excluded by command line flags. However, when running an IR JTreg test with a client VM build, IR matching fails when not specifically adding `@requires vm.compiler2.enabled` to exclude the test. - `@Test` and `@ForceCompile` do not work correctly and throw an exception due to an incompatible compilation level selection without C2. - Some internal framework tests fail (the fix also improves `TestDIgnoreCompilerControls` in general). Testing: - Standard tier testing - Testing internal framework tests with standard build (tiered), client VM (without C2) and server VM build (without C1) Thanks, Christian ------------- Commit messages: - 8275104: IR framework does not handle client VM builds correctly Changes: https://git.openjdk.java.net/jdk/pull/6037/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6037&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275104 Stats: 79 lines in 8 files changed: 30 ins; 18 del; 31 mod Patch: https://git.openjdk.java.net/jdk/pull/6037.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6037/head:pull/6037 PR: https://git.openjdk.java.net/jdk/pull/6037 From chagedorn at openjdk.java.net Wed Oct 20 12:04:19 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 20 Oct 2021 12:04:19 GMT Subject: RFR: 8274888: Dump "-DReproduce=true" to the test VM command line output Message-ID: When trying to rerun a failure of the test VM, one can copy the test VM command line printed to the output and directly use that one. But this requires to additionally set `-DReproduce=true` to mock the driver VM. This patch improves the manual work of adding `-DReproduce=true` and dumps it now automatically as part of the command line printed on failures. Thanks, Christian ------------- Commit messages: - 8274888: Dump "-DReproduce=true" to the test VM command line output Changes: https://git.openjdk.java.net/jdk/pull/6039/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6039&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274888 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6039.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6039/head:pull/6039 PR: https://git.openjdk.java.net/jdk/pull/6039 From thartmann at openjdk.java.net Wed Oct 20 12:32:09 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 20 Oct 2021 12:32:09 GMT Subject: RFR: 8274888: Dump "-DReproduce=true" to the test VM command line output In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 11:55:59 GMT, Christian Hagedorn wrote: > When trying to rerun a failure of the test VM, one can copy the test VM command line printed to the output and directly use that one. But this requires to additionally set `-DReproduce=true` to mock the driver VM. This patch improves the manual work of adding `-DReproduce=true` and dumps it now automatically as part of the command line printed on failures. > > Thanks, > Christian Looks good and trivial to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6039 From chagedorn at openjdk.java.net Wed Oct 20 12:38:06 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 20 Oct 2021 12:38:06 GMT Subject: RFR: 8274888: Dump "-DReproduce=true" to the test VM command line output In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 11:55:59 GMT, Christian Hagedorn wrote: > When trying to rerun a failure of the test VM, one can copy the test VM command line printed to the output and directly use that one. But this requires to additionally set `-DReproduce=true` to mock the driver VM. This patch improves the manual work of adding `-DReproduce=true` and dumps it now automatically as part of the command line printed on failures. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6039 From roland at openjdk.java.net Wed Oct 20 15:18:46 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 20 Oct 2021 15:18:46 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v10] In-Reply-To: References: Message-ID: > JDK-8255150 makes it possible for java code to explicitly perform a > range check on long values. JDK-8223051 provides a transformation of > long counted loops into loop nests with an inner int counted > loop. With this change I propose transforming long range checks that > operate on the iv of a long counted loop into range checks that > operate on the iv of the int inner loop once it has been > created. Existing range check eliminations can then kick in. > > Transformation of range checks is piggy backed on the loop nest > creation for 2 reasons: > > - pattern matching range checks is easier right before the loop nest > is created > > - the number of iterations of the inner loop is adjusted so scale * > inner_iv doesn't overflow > > C2 has logic to delay some split if transformations so they don't > break the scale * iv + offset pattern. I reused that logic for long > range checks and had to relax what's considered a range check because > initially a range check from Object.checkIndex() may include a test > for range > 0 that needs a round of loop opts to be hoisted. I realize > there's some code duplication but I didn't see a way to share logic > between IdealLoopTree::may_have_range_check() > IdealLoopTree::policy_range_check() that would feel right. > > I realize the comment in PhaseIdealLoop::transform_long_range_checks() > is scary. FWIW, it's not as complicated as it looks. I found drawing > the range covered by the entire long loop and the range covered by the > inner loop help see how range checks can be transformed. Then the > comment helps make sure all cases are covered and verify the generated > code actually covers all of them. > > One issue is overflow. I think the fact that inner_iv * scale doesn't > overflow helps simplify thing. One possible overflow is that of scale > * upper + offset which is handled by forcing all range checks in that > case to deoptimize. I don't think other case of overflow needs special > handling. > > This was tested with a Memory Segment micro benchmark (and patched > Memory Segment support to take advantage of the new checkIndex > intrinsic, both provided by Maurizio). Range checks in the micro > benchmark are properly optimized (and performance increases > significantly). Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - variable name change to fix the build on arm/ppc + small fix - IR matching test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2045/files - new: https://git.openjdk.java.net/jdk/pull/2045/files/86b73147..0ab6b6b1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2045&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2045&range=08-09 Stats: 101 lines in 2 files changed: 67 ins; 1 del; 33 mod Patch: https://git.openjdk.java.net/jdk/pull/2045.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2045/head:pull/2045 PR: https://git.openjdk.java.net/jdk/pull/2045 From kvn at openjdk.java.net Wed Oct 20 15:32:07 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 20 Oct 2021 15:32:07 GMT Subject: RFR: 8275104: IR framework does not handle client VM builds correctly In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 11:19:01 GMT, Christian Hagedorn wrote: > While the IR framework is primarily used for C2 IR verification, it should also work with client VM builds. There are currently some problems which are fixed with this patch: > > - The IR framework currently only bails out of IR matching if C2 is excluded by command line flags. However, when running an IR JTreg test with a client VM build, IR matching fails when not specifically adding `@requires vm.compiler2.enabled` to exclude the test. > - `@Test` and `@ForceCompile` do not work correctly and throw an exception due to an incompatible compilation level selection without C2. > - Some internal framework tests fail (the fix also improves `TestDIgnoreCompilerControls` in general). > > Testing: > > - Standard tier testing > - Testing internal framework tests with standard build (tiered), client VM (without C2) and server VM build (without C1) > > Thanks, > Christian Seems fine. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6037 From kvn at openjdk.java.net Wed Oct 20 15:34:07 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 20 Oct 2021 15:34:07 GMT Subject: RFR: 8274888: Dump "-DReproduce=true" to the test VM command line output In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 11:55:59 GMT, Christian Hagedorn wrote: > When trying to rerun a failure of the test VM, one can copy the test VM command line printed to the output and directly use that one. But this requires to additionally set `-DReproduce=true` to mock the driver VM. This patch improves the manual work of adding `-DReproduce=true` and dumps it now automatically as part of the command line printed on failures. > > Thanks, > Christian Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6039 From roland at openjdk.java.net Wed Oct 20 15:38:33 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 20 Oct 2021 15:38:33 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v11] In-Reply-To: References: Message-ID: > JDK-8255150 makes it possible for java code to explicitly perform a > range check on long values. JDK-8223051 provides a transformation of > long counted loops into loop nests with an inner int counted > loop. With this change I propose transforming long range checks that > operate on the iv of a long counted loop into range checks that > operate on the iv of the int inner loop once it has been > created. Existing range check eliminations can then kick in. > > Transformation of range checks is piggy backed on the loop nest > creation for 2 reasons: > > - pattern matching range checks is easier right before the loop nest > is created > > - the number of iterations of the inner loop is adjusted so scale * > inner_iv doesn't overflow > > C2 has logic to delay some split if transformations so they don't > break the scale * iv + offset pattern. I reused that logic for long > range checks and had to relax what's considered a range check because > initially a range check from Object.checkIndex() may include a test > for range > 0 that needs a round of loop opts to be hoisted. I realize > there's some code duplication but I didn't see a way to share logic > between IdealLoopTree::may_have_range_check() > IdealLoopTree::policy_range_check() that would feel right. > > I realize the comment in PhaseIdealLoop::transform_long_range_checks() > is scary. FWIW, it's not as complicated as it looks. I found drawing > the range covered by the entire long loop and the range covered by the > inner loop help see how range checks can be transformed. Then the > comment helps make sure all cases are covered and verify the generated > code actually covers all of them. > > One issue is overflow. I think the fact that inner_iv * scale doesn't > overflow helps simplify thing. One possible overflow is that of scale > * upper + offset which is handled by forcing all range checks in that > case to deoptimize. I don't think other case of overflow needs special > handling. > > This was tested with a Memory Segment micro benchmark (and patched > Memory Segment support to take advantage of the new checkIndex > intrinsic, both provided by Maurizio). Range checks in the micro > benchmark are properly optimized (and performance increases > significantly). Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: windows build fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2045/files - new: https://git.openjdk.java.net/jdk/pull/2045/files/0ab6b6b1..b468b680 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2045&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2045&range=09-10 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2045.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2045/head:pull/2045 PR: https://git.openjdk.java.net/jdk/pull/2045 From sviswanathan at openjdk.java.net Wed Oct 20 17:36:05 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 20 Oct 2021 17:36:05 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2] In-Reply-To: References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Tue, 19 Oct 2021 20:34:55 GMT, Vamsi Parasa wrote: >> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. > > Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > refactoring to remove code duplication by using a common routine for UMulHiLNode and MulHiLNode Marked as reviewed by sviswanathan (Reviewer). The patch looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From sviswanathan at openjdk.java.net Wed Oct 20 17:36:06 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 20 Oct 2021 17:36:06 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2] In-Reply-To: <8MuiklM5Nt3VkzyVHbWqwMh_LkVvVY2Mf65_0zTx4Kw=.9351008f-b489-4103-be9d-87e6fc4a8f39@github.com> References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> <8MuiklM5Nt3VkzyVHbWqwMh_LkVvVY2Mf65_0zTx4Kw=.9351008f-b489-4103-be9d-87e6fc4a8f39@github.com> Message-ID: On Fri, 15 Oct 2021 20:19:31 GMT, Vladimir Kozlov wrote: >>> How you verified correctness of results? I suggest to extend `test/jdk//java/lang/Math/MultiplicationTests.java` test to cover unsigned method. >> >> Tests for unsignedMultiplyHigh were already added in test/jdk//java/lang/Math/MultiplicationTests.java (in July 2021 by Brian Burkhalter). Used that test to verify the correctness of the results. > >> > How you verified correctness of results? I suggest to extend `test/jdk//java/lang/Math/MultiplicationTests.java` test to cover unsigned method. >> >> Tests for unsignedMultiplyHigh were already added in test/jdk//java/lang/Math/MultiplicationTests.java (in July 2021 by Brian Burkhalter). Used that test to verify the correctness of the results. > > Good. It seems I have old version of the test. > Did you run it with -Xcomp? How you verified that intrinsic is used? @vnkozlov if the patch looks ok to you, could you please run this through your testing? ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From iveresov at openjdk.java.net Wed Oct 20 17:42:18 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 20 Oct 2021 17:42:18 GMT Subject: RFR: 8273712: C2: Add mechanism for rejecting inlining of low frequency call sites and deprecate MinInliningThreshold. Message-ID: Currently the inlining heuristic uses absolute method invocation count to reject methods that are rarely executed (see MinInliningThreshold and its uses). This presents two problems: 1. Method can be rarely used in a particular caller, yet if its total execution count is high it may be still inlined. 2. The use of absolute counts is inherently problematic with the current compilation policy (adaptive threshold and background compilation). It leads to instabilities of inlining decisions. The proposed solution is to consider call site execution ratio in order to reject callees that are rarely executed. Set the old cutoff parameter (MinInliningThreshold) to 0 to essentially disable it and later deprecate it. Setting the introduced MinInlineFrequencyRatio = 0.0085 produces the following notable improvements: Renaissance-Dotty 1.23% Renaissance-Mnemonics 3.88% Renaissance-NaiveBayes 9.23% Renaissance-ScalaKmeans 1.36% SPECjvm2008-Derby 3.16% There are of course some regressions but those are few and on the order of 1.5% ------------- Commit messages: - Remove old option from a test - Make low frequency inling cutoff relative Changes: https://git.openjdk.java.net/jdk/pull/6046/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6046&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273712 Stats: 38 lines in 6 files changed: 16 ins; 4 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/6046.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6046/head:pull/6046 PR: https://git.openjdk.java.net/jdk/pull/6046 From kvn at openjdk.java.net Wed Oct 20 19:11:09 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 20 Oct 2021 19:11:09 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2] In-Reply-To: References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Tue, 19 Oct 2021 20:34:55 GMT, Vamsi Parasa wrote: >> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. > > Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > refactoring to remove code duplication by using a common routine for UMulHiLNode and MulHiLNode Looks good. And I submitted testing. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5933 From kvn at openjdk.java.net Wed Oct 20 22:18:09 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 20 Oct 2021 22:18:09 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2] In-Reply-To: References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Tue, 19 Oct 2021 20:34:55 GMT, Vamsi Parasa wrote: >> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. > > Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > refactoring to remove code duplication by using a common routine for UMulHiLNode and MulHiLNode Tests passed. You can integrate changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From duke at openjdk.java.net Wed Oct 20 22:22:01 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Wed, 20 Oct 2021 22:22:01 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2] In-Reply-To: References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Wed, 20 Oct 2021 22:14:33 GMT, Vladimir Kozlov wrote: > Tests passed. You can integrate changes. Thanks Vladimir! What are the next steps to integrate the change? ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From kvn at openjdk.java.net Wed Oct 20 22:29:08 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 20 Oct 2021 22:29:08 GMT Subject: RFR: 8273712: C2: Add mechanism for rejecting inlining of low frequency call sites and deprecate MinInliningThreshold. In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 17:29:41 GMT, Igor Veresov wrote: > Currently the inlining heuristic uses absolute method invocation count to reject methods that are rarely executed (see MinInliningThreshold and its uses). > This presents two problems: > > 1. Method can be rarely used in a particular caller, yet if its total execution count is high it may be still inlined. > 2. The use of absolute counts is inherently problematic with the current compilation policy (adaptive threshold and background compilation). It leads to instabilities of inlining decisions. > > The proposed solution is to consider call site execution ratio in order to reject callees that are rarely executed. Set the old cutoff parameter (MinInliningThreshold) to 0 to essentially disable it and later deprecate it. > > Setting the introduced MinInlineFrequencyRatio = 0.0085 produces the following notable improvements: > Renaissance-Dotty 1.23% > Renaissance-Mnemonics 3.88% > Renaissance-NaiveBayes 9.23% > Renaissance-ScalaKmeans 1.36% > SPECjvm2008-Derby 3.16% > > There are of course some regressions but those are few and on the order of 1.5% > > > This PR will require a CSR before it can be pushed. I'll file a CSR after this is reviewed. Looks fine but I don't see CSR filed for MinInliningThreshold which is product flag. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6046 From rbackman at openjdk.java.net Wed Oct 20 22:40:10 2021 From: rbackman at openjdk.java.net (Rickard =?UTF-8?B?QsOkY2ttYW4=?=) Date: Wed, 20 Oct 2021 22:40:10 GMT Subject: RFR: 8273712: C2: Add mechanism for rejecting inlining of low frequency call sites and deprecate MinInliningThreshold. In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 17:29:41 GMT, Igor Veresov wrote: > Currently the inlining heuristic uses absolute method invocation count to reject methods that are rarely executed (see MinInliningThreshold and its uses). > This presents two problems: > > 1. Method can be rarely used in a particular caller, yet if its total execution count is high it may be still inlined. > 2. The use of absolute counts is inherently problematic with the current compilation policy (adaptive threshold and background compilation). It leads to instabilities of inlining decisions. > > The proposed solution is to consider call site execution ratio in order to reject callees that are rarely executed. Set the old cutoff parameter (MinInliningThreshold) to 0 to essentially disable it and later deprecate it. > > Setting the introduced MinInlineFrequencyRatio = 0.0085 produces the following notable improvements: > Renaissance-Dotty 1.23% > Renaissance-Mnemonics 3.88% > Renaissance-NaiveBayes 9.23% > Renaissance-ScalaKmeans 1.36% > SPECjvm2008-Derby 3.16% > > There are of course some regressions but those are few and on the order of 1.5% > > > This PR will require a CSR before it can be pushed. I'll file a CSR after this is reviewed. Looks good. ------------- Marked as reviewed by rbackman (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6046 From duke at openjdk.java.net Wed Oct 20 22:44:06 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Wed, 20 Oct 2021 22:44:06 GMT Subject: Integrated: 8275167: x86 intrinsic for unsignedMultiplyHigh In-Reply-To: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Wed, 13 Oct 2021 18:55:10 GMT, Vamsi Parasa wrote: > Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. This pull request has now been integrated. Changeset: af7c56b8 Author: vamsi-parasa Committer: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/af7c56b85bb2828a9d68f9e1c753a4adfa7ebb4f Stats: 63 lines in 11 files changed: 61 ins; 2 del; 0 mod 8275167: x86 intrinsic for unsignedMultiplyHigh Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From iveresov at openjdk.java.net Wed Oct 20 22:51:13 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 20 Oct 2021 22:51:13 GMT Subject: RFR: 8273712: C2: Add mechanism for rejecting inlining of low frequency call sites and deprecate MinInliningThreshold. In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 22:25:51 GMT, Vladimir Kozlov wrote: > Looks fine but I don't see CSR filed for MinInliningThreshold which is product flag. I'm going to file it now. ------------- PR: https://git.openjdk.java.net/jdk/pull/6046 From iveresov at openjdk.java.net Wed Oct 20 23:07:02 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 20 Oct 2021 23:07:02 GMT Subject: RFR: 8273712: C2: Add mechanism for rejecting inlining of low frequency call sites and deprecate MinInliningThreshold. In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 17:29:41 GMT, Igor Veresov wrote: > Currently the inlining heuristic uses absolute method invocation count to reject methods that are rarely executed (see MinInliningThreshold and its uses). > This presents two problems: > > 1. Method can be rarely used in a particular caller, yet if its total execution count is high it may be still inlined. > 2. The use of absolute counts is inherently problematic with the current compilation policy (adaptive threshold and background compilation). It leads to instabilities of inlining decisions. > > The proposed solution is to consider call site execution ratio in order to reject callees that are rarely executed. Set the old cutoff parameter (MinInliningThreshold) to 0 to essentially disable it and later deprecate it. > > Setting the introduced MinInlineFrequencyRatio = 0.0085 produces the following notable improvements: > Renaissance-Dotty 1.23% > Renaissance-Mnemonics 3.88% > Renaissance-NaiveBayes 9.23% > Renaissance-ScalaKmeans 1.36% > SPECjvm2008-Derby 3.16% > > There are of course some regressions but those are few and on the order of 1.5% > > > This PR will require a CSR before it can be pushed. I'll file a CSR after this is reviewed. I've filed a CSR (https://bugs.openjdk.java.net/browse/JDK-8275676). Could you guys please yourselves as reviewers to it? Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/6046 From iveresov at openjdk.java.net Thu Oct 21 02:17:06 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Thu, 21 Oct 2021 02:17:06 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled In-Reply-To: References: Message-ID: <_Yvu8qyC5WGh50nUTJWsMpUxxEgT2P2ya3LjvAWVgLc=.fc0de578-edab-462c-bbaf-6d8a8604caf9@github.com> On Tue, 12 Oct 2021 02:49:53 GMT, SUN Guoyun wrote: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun There is a more elegant way to tweak the thresholds - `CompileThresholdScaling`. It affects all the thresholds so it's more robust way to do it. Alternatively, perhaps the test could do more iterations? ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From dlong at openjdk.java.net Thu Oct 21 04:19:22 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 21 Oct 2021 04:19:22 GMT Subject: RFR: 8275347 ciReplay: staticfield lines not properly terminated Message-ID: Summary of changes: - fix staticfield line termination for empty String constants - tighten up checks during parse time to detect any un-parsed input at the end of a line - updated test to trigger the problem, so that it fails without the fix - cleanup: removed _buffer_pos field and related logic that isn't needed ------------- Commit messages: - fix replay line termination Changes: https://git.openjdk.java.net/jdk/pull/6057/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6057&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275347 Stats: 40 lines in 3 files changed: 24 ins; 7 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/6057.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6057/head:pull/6057 PR: https://git.openjdk.java.net/jdk/pull/6057 From roland at openjdk.java.net Thu Oct 21 06:50:41 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 21 Oct 2021 06:50:41 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v12] In-Reply-To: References: Message-ID: > JDK-8255150 makes it possible for java code to explicitly perform a > range check on long values. JDK-8223051 provides a transformation of > long counted loops into loop nests with an inner int counted > loop. With this change I propose transforming long range checks that > operate on the iv of a long counted loop into range checks that > operate on the iv of the int inner loop once it has been > created. Existing range check eliminations can then kick in. > > Transformation of range checks is piggy backed on the loop nest > creation for 2 reasons: > > - pattern matching range checks is easier right before the loop nest > is created > > - the number of iterations of the inner loop is adjusted so scale * > inner_iv doesn't overflow > > C2 has logic to delay some split if transformations so they don't > break the scale * iv + offset pattern. I reused that logic for long > range checks and had to relax what's considered a range check because > initially a range check from Object.checkIndex() may include a test > for range > 0 that needs a round of loop opts to be hoisted. I realize > there's some code duplication but I didn't see a way to share logic > between IdealLoopTree::may_have_range_check() > IdealLoopTree::policy_range_check() that would feel right. > > I realize the comment in PhaseIdealLoop::transform_long_range_checks() > is scary. FWIW, it's not as complicated as it looks. I found drawing > the range covered by the entire long loop and the range covered by the > inner loop help see how range checks can be transformed. Then the > comment helps make sure all cases are covered and verify the generated > code actually covers all of them. > > One issue is overflow. I think the fact that inner_iv * scale doesn't > overflow helps simplify thing. One possible overflow is that of scale > * upper + offset which is handled by forcing all range checks in that > case to deoptimize. I don't think other case of overflow needs special > handling. > > This was tested with a Memory Segment micro benchmark (and patched > Memory Segment support to take advantage of the new checkIndex > intrinsic, both provided by Maurizio). Range checks in the micro > benchmark are properly optimized (and performance increases > significantly). Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: build fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2045/files - new: https://git.openjdk.java.net/jdk/pull/2045/files/b468b680..0409fb3e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2045&range=11 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2045&range=10-11 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2045.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2045/head:pull/2045 PR: https://git.openjdk.java.net/jdk/pull/2045 From chagedorn at openjdk.java.net Thu Oct 21 07:04:29 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 21 Oct 2021 07:04:29 GMT Subject: RFR: 8275104: IR framework does not handle client VM builds correctly [v2] In-Reply-To: References: Message-ID: > While the IR framework is primarily used for C2 IR verification, it should also work with client VM builds. There are currently some problems which are fixed with this patch: > > - The IR framework currently only bails out of IR matching if C2 is excluded by command line flags. However, when running an IR JTreg test with a client VM build, IR matching fails when not specifically adding `@requires vm.compiler2.enabled` to exclude the test. > - `@Test` and `@ForceCompile` do not work correctly and throw an exception due to an incompatible compilation level selection without C2. > - Some internal framework tests fail (the fix also improves `TestDIgnoreCompilerControls` in general). > > Testing: > > - Standard tier testing > - Testing internal framework tests with standard build (tiered), client VM (without C2) and server VM build (without C1) > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Change MinInliningThreshold to TypeProfileLevel in IRExample ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6037/files - new: https://git.openjdk.java.net/jdk/pull/6037/files/7e256e6d..1670bcea Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6037&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6037&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6037.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6037/head:pull/6037 PR: https://git.openjdk.java.net/jdk/pull/6037 From chagedorn at openjdk.java.net Thu Oct 21 07:04:29 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 21 Oct 2021 07:04:29 GMT Subject: RFR: 8275104: IR framework does not handle client VM builds correctly In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 11:19:01 GMT, Christian Hagedorn wrote: > While the IR framework is primarily used for C2 IR verification, it should also work with client VM builds. There are currently some problems which are fixed with this patch: > > - The IR framework currently only bails out of IR matching if C2 is excluded by command line flags. However, when running an IR JTreg test with a client VM build, IR matching fails when not specifically adding `@requires vm.compiler2.enabled` to exclude the test. > - `@Test` and `@ForceCompile` do not work correctly and throw an exception due to an incompatible compilation level selection without C2. > - Some internal framework tests fail (the fix also improves `TestDIgnoreCompilerControls` in general). > > Testing: > > - Standard tier testing > - Testing internal framework tests with standard build (tiered), client VM (without C2) and server VM build (without C1) > > Thanks, > Christian Thanks Vladimir for your review! Given that 8273712 is going to deprecate `MinInliningThreshold`, I've changed the flag in `IRExample` to another int based flag: `TypeProfileLevel`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6037 From neliasso at openjdk.java.net Thu Oct 21 07:07:59 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 21 Oct 2021 07:07:59 GMT Subject: RFR: 8275347 ciReplay: staticfield lines not properly terminated In-Reply-To: References: Message-ID: On Thu, 21 Oct 2021 04:11:32 GMT, Dean Long wrote: > Summary of changes: > - fix staticfield line termination for empty String constants > - tighten up checks during parse time to detect any un-parsed input at the end of a line > - updated test to trigger the problem, so that it fails without the fix > - cleanup: removed _buffer_pos field and related logic that isn't needed Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6057 From chagedorn at openjdk.java.net Thu Oct 21 07:57:04 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 21 Oct 2021 07:57:04 GMT Subject: RFR: 8275347 ciReplay: staticfield lines not properly terminated In-Reply-To: References: Message-ID: On Thu, 21 Oct 2021 04:11:32 GMT, Dean Long wrote: > Summary of changes: > - fix staticfield line termination for empty String constants > - tighten up checks during parse time to detect any un-parsed input at the end of a line > - updated test to trigger the problem, so that it fails without the fix > - cleanup: removed _buffer_pos field and related logic that isn't needed Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6057 From dnsimon at openjdk.java.net Thu Oct 21 12:49:28 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 21 Oct 2021 12:49:28 GMT Subject: RFR: 8275645: [JVMCI] avoid unaligned volatile reads on AArch64 Message-ID: This PR updates c2v-readFieldValue to always do the field read with volatile semantics but without using a volatile read instruction directly to avoid platform specific issues with unaligned reads (e.g., an unaligned ldar on AArch64 causes a SIGBUS). ------------- Commit messages: - avoid unaligned volatile reads in c2v_readFieldValue - expanded MemoryAccessProviderTest to test unaligned reads Changes: https://git.openjdk.java.net/jdk/pull/6044/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6044&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275645 Stats: 63 lines in 6 files changed: 41 ins; 1 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/6044.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6044/head:pull/6044 PR: https://git.openjdk.java.net/jdk/pull/6044 From chagedorn at openjdk.java.net Thu Oct 21 14:06:08 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 21 Oct 2021 14:06:08 GMT Subject: RFR: 8274888: Dump "-DReproduce=true" to the test VM command line output In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 11:55:59 GMT, Christian Hagedorn wrote: > When trying to rerun a failure of the test VM, one can copy the test VM command line printed to the output and directly use that one. But this requires to additionally set `-DReproduce=true` to mock the driver VM. This patch improves the manual work of adding `-DReproduce=true` and dumps it now automatically as part of the command line printed on failures. > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6039 From chagedorn at openjdk.java.net Thu Oct 21 14:09:11 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 21 Oct 2021 14:09:11 GMT Subject: Integrated: 8274888: Dump "-DReproduce=true" to the test VM command line output In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 11:55:59 GMT, Christian Hagedorn wrote: > When trying to rerun a failure of the test VM, one can copy the test VM command line printed to the output and directly use that one. But this requires to additionally set `-DReproduce=true` to mock the driver VM. This patch improves the manual work of adding `-DReproduce=true` and dumps it now automatically as part of the command line printed on failures. > > Thanks, > Christian This pull request has now been integrated. Changeset: 3b0ce23b Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/3b0ce23bcd827d0998fe9b43e5b0220c915dab21 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8274888: Dump "-DReproduce=true" to the test VM command line output Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6039 From jbhateja at openjdk.java.net Thu Oct 21 14:50:31 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 21 Oct 2021 14:50:31 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v4] In-Reply-To: References: Message-ID: <0WViJD0Uip3CjSkJq6T1wxmgGTk4vbo6IngCAyBuc34=.997a2434-e648-40ef-a8f0-fe1669387fb6@github.com> > Hi All, > > This patch optimizes macro assembly routines used by fill stubs of various primitive types for X86 AVX-512 target. > Following are the main changes:- > 1) Specialized instruction sequence for fill operation over various block sizes. > 2) Control flow is sensitive to AVX3Threshold and generated code operates over 32 byte vector (YMM) if > block size is less than threshold else instructions operate of 64 byte vector (ZMM). > 3) Bulk fill operation is performed by a destination aligned fill loop with appropriate unroll factor, this > avoids any cache line split penalty and improves performance. > 4) Currently fill patterns are vectorized by auto-vectorizer and generated code operates over vectors > of MaxVectorSize, in addition auto-vectorizer is oblivious to AVX3Thresholds and this may result into > performance degradation over prior generation of X86 servers where 64 byte vector stores using ZMM > registers operates at reduced CPU frequency. > Patch enables JVM runtime flag -XX:+OptimizedFill ON by default for X86 target supporting AVX-512 feature. > 5) Patch also optimizes the mask generation sequence of fill* macro assembly routines using BZHI instruction. > > Performance measurements of an existing JMH micro over Icelake server shows ~1.1-4.0X gains for fill operation with varying block sizes. > > Following are detailed results: > > System Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S) > > Benchmark | Size | Baseline Auto-vectorized -XX:-OptimizeFill (ops/ms) | New Optimized Fill AVX3 Th=4096 (ops/ms) | Gain Factor (OptFill AVX3Th=4096/Baseline) > -- | -- | -- | -- | -- > ArraysFill.testByteFill | 10 | 208480.451 | 399980.93 | 1.918553649 > ArraysFill.testByteFill | 16 | 193927.021 | 381156.448 | 1.965463328 > ArraysFill.testByteFill | 31 | 99175.805 | 399990.605 | 4.033147046 > ArraysFill.testByteFill | 59 | 141430.876 | 342233.497 | 2.419793377 > ArraysFill.testByteFill | 89 | 82091.504 | 342232.822 | 4.168918893 > ArraysFill.testByteFill | 126 | 72154.769 | 310536.053 | 4.303749528 > ArraysFill.testByteFill | 250 | 18986.775 | 158263.189 | 8.335443434 > ArraysFill.testByteFill | 266 | 30057.331 | 166819.658 | 5.550048938 > ArraysFill.testByteFill | 511 | 30094.92 | 116800.155 | 3.88105883 > ArraysFill.testByteFill | 1021 | 38467.507 | 89235.56 | 2.319764574 > ArraysFill.testByteFill | 2047 | 32267.535 | 70625.015 | 2.188732886 > ArraysFill.testByteFill | 2048 | 25503.489 | 64848.532 | 2.542731781 > ArraysFill.testByteFill | 4095 | 22432.636 | 42449.149 | 1.892294289 > ArraysFill.testByteFill | 8195 | 16468.923 | 24810.485 | 1.506503188 > ArraysFill.testCharFill | 10 | 221038.566 | 400005.661 | 1.809664568 > ArraysFill.testCharFill | 16 | 209138.43 | 381171.236 | 1.822578643 > ArraysFill.testCharFill | 31 | 93139.021 | 376441.98 | 4.041721461 > ArraysFill.testCharFill | 59 | 63575.554 | 310559.54 | 4.884889245 > ArraysFill.testCharFill | 89 | 61900.064 | 191445.936 | 3.092822909 > ArraysFill.testCharFill | 126 | 36854.615 | 164187.37 | 4.455001633 > ArraysFill.testCharFill | 250 | 37991.306 | 138797.511 | 3.653401939 > ArraysFill.testCharFill | 266 | 44459.522 | 170334.083 | 3.831217146 > ArraysFill.testCharFill | 511 | 52275.926 | 103012.53 | 1.970553903 > ArraysFill.testCharFill | 1021 | 51803.73 | 80187.107 | 1.547902188 > ArraysFill.testCharFill | 2047 | 35820.742 | 38973.828 | 1.088024028 > ArraysFill.testCharFill | 2048 | 35280.779 | 38209.361 | 1.083007861 > ArraysFill.testCharFill | 4095 | 21053.869 | 25006.99 | 1.187762211 > ArraysFill.testCharFill | 8195 | 11419.785 | 12662.777 | 1.108845482 > ArraysFill.testDoubleFill | 10 | 266086.021 | 220036.789 | 0.826938552 > ArraysFill.testDoubleFill | 16 | 216597.316 | 218875.135 | 1.010516377 > ArraysFill.testDoubleFill | 31 | 151868.92 | 174250.587 | 1.147374901 > ArraysFill.testDoubleFill | 59 | 196480.253 | 194467.527 | 0.98975609 > ArraysFill.testDoubleFill | 89 | 109787.976 | 102698.432 | 0.935425133 > ArraysFill.testDoubleFill | 126 | 93945.51 | 121697.956 | 1.295410031 > ArraysFill.testDoubleFill | 250 | 97830.626 | 81429.644 | 0.832353296 > ArraysFill.testDoubleFill | 266 | 83560.898 | 91313.593 | 1.092778981 > ArraysFill.testDoubleFill | 511 | 48710.087 | 48145.392 | 0.988407021 > ArraysFill.testDoubleFill | 1021 | 25145.002 | 25163.03 | 1.000716962 > ArraysFill.testDoubleFill | 2047 | 12665.468 | 12639.651 | 0.997961623 > ArraysFill.testDoubleFill | 2048 | 12202.183 | 12619.316 | 1.034185113 > ArraysFill.testDoubleFill | 4095 | 6319.101 | 6320.488 | 1.000219493 > ArraysFill.testDoubleFill | 8195 | 882.585 | 883.727 | 1.001293926 > ArraysFill.testFloatFill | 10 | 193690.976 | 370572.639 | 1.913215818 > ArraysFill.testFloatFill | 16 | 178498.07 | 342227.406 | 1.9172611 > ArraysFill.testFloatFill | 31 | 160406.649 | 323327.925 | 2.015676576 > ArraysFill.testFloatFill | 59 | 119643.034 | 177091.185 | 1.48016294 > ArraysFill.testFloatFill | 89 | 64783.111 | 168280.961 | 2.597605431 > ArraysFill.testFloatFill | 126 | 85291.623 | 152788.86 | 1.791370062 > ArraysFill.testFloatFill | 250 | 98864.197 | 115429.942 | 1.167560608 > ArraysFill.testFloatFill | 266 | 104361.908 | 106769.11 | 1.023065906 > ArraysFill.testFloatFill | 511 | 59063.325 | 73726.544 | 1.248262674 > ArraysFill.testFloatFill | 1021 | 46426.631 | 44255.239 | 0.953229602 > ArraysFill.testFloatFill | 2047 | 23853.72 | 24988.53 | 1.047573712 > ArraysFill.testFloatFill | 2048 | 23774.697 | 24723.921 | 1.039925809 > ArraysFill.testFloatFill | 4095 | 11879.115 | 12574.113 | 1.058505874 > ArraysFill.testFloatFill | 8195 | 6288.73 | 6309.257 | 1.003264093 > ArraysFill.testIntFill | 10 | 202623.377 | 370696.239 | 1.829484063 > ArraysFill.testIntFill | 16 | 187487.425 | 342203.932 | 1.825210048 > ArraysFill.testIntFill | 31 | 107876.62 | 323291.016 | 2.996858967 > ArraysFill.testIntFill | 59 | 76540.074 | 177755.374 | 2.322383096 > ArraysFill.testIntFill | 89 | 77088.258 | 168496.776 | 2.185764478 > ArraysFill.testIntFill | 126 | 92532.969 | 150986.404 | 1.631703874 > ArraysFill.testIntFill | 250 | 99993.079 | 106098.703 | 1.061060466 > ArraysFill.testIntFill | 266 | 105121.5 | 106809.473 | 1.016057353 > ArraysFill.testIntFill | 511 | 61711.338 | 84318.27 | 1.366333525 > ArraysFill.testIntFill | 1021 | 45725.648 | 44835.618 | 0.980535432 > ArraysFill.testIntFill | 2047 | 24130.633 | 25001.727 | 1.036099094 > ArraysFill.testIntFill | 2048 | 23873.255 | 24980.662 | 1.04638693 > ArraysFill.testIntFill | 4095 | 12459.376 | 12666.815 | 1.016649229 > ArraysFill.testIntFill | 8195 | 6303.873 | 6298.852 | 0.999203506 > ArraysFill.testLongFill | 10 | 221803.338 | 203110.868 | 0.915725028 > ArraysFill.testLongFill | 16 | 214013.975 | 230463.726 | 1.076862976 > ArraysFill.testLongFill | 31 | 153858.758 | 144465.921 | 0.938951561 > ArraysFill.testLongFill | 59 | 102187.914 | 112064.383 | 1.09665007 > ArraysFill.testLongFill | 89 | 111940.314 | 107757.211 | 0.962630952 > ArraysFill.testLongFill | 126 | 137992.49 | 110879.813 | 0.803520634 > ArraysFill.testLongFill | 250 | 96629.877 | 96195.678 | 0.995506576 > ArraysFill.testLongFill | 266 | 83984.403 | 86152.382 | 1.025814067 > ArraysFill.testLongFill | 511 | 48698.933 | 48534.404 | 0.996621507 > ArraysFill.testLongFill | 1021 | 25178.805 | 25162.502 | 0.999352511 > ArraysFill.testLongFill | 2047 | 12511.142 | 12652.489 | 1.01129769 > ArraysFill.testLongFill | 2048 | 12592.614 | 12622.317 | 1.002358764 > ArraysFill.testLongFill | 4095 | 6377.694 | 6378.312 | 1.0000969 > ArraysFill.testLongFill | 8195 | 885.065 | 884.811 | 0.999713015 > ArraysFill.testShortFill | 10 | 196799.048 | 399963.161 | 2.032342966 > ArraysFill.testShortFill | 16 | 191981.455 | 381173.675 | 1.985471331 > ArraysFill.testShortFill | 31 | 98647.156 | 370750.549 | 3.758350104 > ArraysFill.testShortFill | 59 | 79046.737 | 310586.902 | 3.929155254 > ArraysFill.testShortFill | 89 | 128874.522 | 186302.59 | 1.445612268 > ArraysFill.testShortFill | 126 | 47243.773 | 177947.204 | 3.766574782 > ArraysFill.testShortFill | 250 | 37506.377 | 152968.336 | 4.078462071 > ArraysFill.testShortFill | 266 | 41782.87 | 169466.305 | 4.055879958 > ArraysFill.testShortFill | 511 | 44061.823 | 109352.795 | 2.481803692 > ArraysFill.testShortFill | 1021 | 28799.157 | 81115.934 | 2.816607931 > ArraysFill.testShortFill | 2047 | 38667.85 | 38998.02 | 1.008538618 > ArraysFill.testShortFill | 2048 | 36626.321 | 38995.272 | 1.064678923 > ArraysFill.testShortFill | 4095 | 16606.53 | 24724.43 | 1.488837825 > ArraysFill.testShortFill | 8195 | 11679.891 | 12627.519 | 1.081133291 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8275047: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5967/files - new: https://git.openjdk.java.net/jdk/pull/5967/files/d599ac2d..1e8d5434 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5967&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5967&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5967/head:pull/5967 PR: https://git.openjdk.java.net/jdk/pull/5967 From jbhateja at openjdk.java.net Thu Oct 21 14:55:04 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 21 Oct 2021 14:55:04 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v3] In-Reply-To: <1GoVv9gDf2moQr5_TULCvRP3MT_mgre7xQ6EariqAFM=.8372f6b8-fd33-43fb-8a10-bb7c2a2fa314@github.com> References: <1GoVv9gDf2moQr5_TULCvRP3MT_mgre7xQ6EariqAFM=.8372f6b8-fd33-43fb-8a10-bb7c2a2fa314@github.com> Message-ID: <1W_eQoq2vTdmOJTrNx3MuiveRTjDgvYq1iDlaIpZ6tk=.acf8cac6-b319-4dbf-afe5-76b12c9dae1a@github.com> On Mon, 18 Oct 2021 16:27:35 GMT, Vladimir Kozlov wrote: > Why improvement for testLongFill (59, 89) is much worse (not existing) than for testDoubleFill? The element size is the same. Hi Vladimir, currently fill stubs are not supported for long/double types, it could be run-to-run variation. Call overhead may over power any gain due to specialized blocks for small sizes. I have updated the perf-results. ------------- PR: https://git.openjdk.java.net/jdk/pull/5967 From jbhateja at openjdk.java.net Thu Oct 21 15:20:16 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 21 Oct 2021 15:20:16 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v4] In-Reply-To: <0WViJD0Uip3CjSkJq6T1wxmgGTk4vbo6IngCAyBuc34=.997a2434-e648-40ef-a8f0-fe1669387fb6@github.com> References: <0WViJD0Uip3CjSkJq6T1wxmgGTk4vbo6IngCAyBuc34=.997a2434-e648-40ef-a8f0-fe1669387fb6@github.com> Message-ID: On Thu, 21 Oct 2021 14:50:31 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes macro assembly routines used by fill stubs of various primitive types for X86 AVX-512 target. >> Following are the main changes:- >> 1) Specialized instruction sequence for fill operation over various block sizes. >> 2) Control flow is sensitive to AVX3Threshold and generated code operates over 32 byte vector (YMM) if >> block size is less than threshold else instructions operate of 64 byte vector (ZMM). >> 3) Bulk fill operation is performed by a destination aligned fill loop with appropriate unroll factor, this >> avoids any cache line split penalty and improves performance. >> 4) Currently fill patterns are vectorized by auto-vectorizer and generated code operates over vectors >> of MaxVectorSize, in addition auto-vectorizer is oblivious to AVX3Thresholds and this may result into >> performance degradation over prior generation of X86 servers where 64 byte vector stores using ZMM >> registers operates at reduced CPU frequency. >> Patch enables JVM runtime flag -XX:+OptimizedFill ON by default for X86 target supporting AVX-512 feature. >> 5) Patch also optimizes the mask generation sequence of fill* macro assembly routines using BZHI instruction. >> >> Performance measurements of an existing JMH micro over Icelake server shows ~1.1-4.0X gains for fill operation with varying block sizes. >> >> Following are detailed results: >> >> System Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S) >> >> Benchmark | Size | Baseline Auto-vectorized -XX:-OptimizeFill (ops/ms) | New Optimized Fill AVX3 Th=4096 (ops/ms) | Gain Factor (OptFill AVX3Th=4096/Baseline) >> -- | -- | -- | -- | -- >> ArraysFill.testByteFill | 10 | 208480.451 | 399980.93 | 1.918553649 >> ArraysFill.testByteFill | 16 | 193927.021 | 381156.448 | 1.965463328 >> ArraysFill.testByteFill | 31 | 99175.805 | 399990.605 | 4.033147046 >> ArraysFill.testByteFill | 59 | 141430.876 | 342233.497 | 2.419793377 >> ArraysFill.testByteFill | 89 | 82091.504 | 342232.822 | 4.168918893 >> ArraysFill.testByteFill | 126 | 72154.769 | 310536.053 | 4.303749528 >> ArraysFill.testByteFill | 250 | 18986.775 | 158263.189 | 8.335443434 >> ArraysFill.testByteFill | 266 | 30057.331 | 166819.658 | 5.550048938 >> ArraysFill.testByteFill | 511 | 30094.92 | 116800.155 | 3.88105883 >> ArraysFill.testByteFill | 1021 | 38467.507 | 89235.56 | 2.319764574 >> ArraysFill.testByteFill | 2047 | 32267.535 | 70625.015 | 2.188732886 >> ArraysFill.testByteFill | 2048 | 25503.489 | 64848.532 | 2.542731781 >> ArraysFill.testByteFill | 4095 | 22432.636 | 42449.149 | 1.892294289 >> ArraysFill.testByteFill | 8195 | 16468.923 | 24810.485 | 1.506503188 >> ArraysFill.testCharFill | 10 | 221038.566 | 400005.661 | 1.809664568 >> ArraysFill.testCharFill | 16 | 209138.43 | 381171.236 | 1.822578643 >> ArraysFill.testCharFill | 31 | 93139.021 | 376441.98 | 4.041721461 >> ArraysFill.testCharFill | 59 | 63575.554 | 310559.54 | 4.884889245 >> ArraysFill.testCharFill | 89 | 61900.064 | 191445.936 | 3.092822909 >> ArraysFill.testCharFill | 126 | 36854.615 | 164187.37 | 4.455001633 >> ArraysFill.testCharFill | 250 | 37991.306 | 138797.511 | 3.653401939 >> ArraysFill.testCharFill | 266 | 44459.522 | 170334.083 | 3.831217146 >> ArraysFill.testCharFill | 511 | 52275.926 | 103012.53 | 1.970553903 >> ArraysFill.testCharFill | 1021 | 51803.73 | 80187.107 | 1.547902188 >> ArraysFill.testCharFill | 2047 | 35820.742 | 38973.828 | 1.088024028 >> ArraysFill.testCharFill | 2048 | 35280.779 | 38209.361 | 1.083007861 >> ArraysFill.testCharFill | 4095 | 21053.869 | 25006.99 | 1.187762211 >> ArraysFill.testCharFill | 8195 | 11419.785 | 12662.777 | 1.108845482 >> ArraysFill.testDoubleFill | 10 | 266086.021 | 220036.789 | 0.826938552 >> ArraysFill.testDoubleFill | 16 | 216597.316 | 218875.135 | 1.010516377 >> ArraysFill.testDoubleFill | 31 | 151868.92 | 174250.587 | 1.147374901 >> ArraysFill.testDoubleFill | 59 | 196480.253 | 194467.527 | 0.98975609 >> ArraysFill.testDoubleFill | 89 | 109787.976 | 102698.432 | 0.935425133 >> ArraysFill.testDoubleFill | 126 | 93945.51 | 121697.956 | 1.295410031 >> ArraysFill.testDoubleFill | 250 | 97830.626 | 81429.644 | 0.832353296 >> ArraysFill.testDoubleFill | 266 | 83560.898 | 91313.593 | 1.092778981 >> ArraysFill.testDoubleFill | 511 | 48710.087 | 48145.392 | 0.988407021 >> ArraysFill.testDoubleFill | 1021 | 25145.002 | 25163.03 | 1.000716962 >> ArraysFill.testDoubleFill | 2047 | 12665.468 | 12639.651 | 0.997961623 >> ArraysFill.testDoubleFill | 2048 | 12202.183 | 12619.316 | 1.034185113 >> ArraysFill.testDoubleFill | 4095 | 6319.101 | 6320.488 | 1.000219493 >> ArraysFill.testDoubleFill | 8195 | 882.585 | 883.727 | 1.001293926 >> ArraysFill.testFloatFill | 10 | 193690.976 | 370572.639 | 1.913215818 >> ArraysFill.testFloatFill | 16 | 178498.07 | 342227.406 | 1.9172611 >> ArraysFill.testFloatFill | 31 | 160406.649 | 323327.925 | 2.015676576 >> ArraysFill.testFloatFill | 59 | 119643.034 | 177091.185 | 1.48016294 >> ArraysFill.testFloatFill | 89 | 64783.111 | 168280.961 | 2.597605431 >> ArraysFill.testFloatFill | 126 | 85291.623 | 152788.86 | 1.791370062 >> ArraysFill.testFloatFill | 250 | 98864.197 | 115429.942 | 1.167560608 >> ArraysFill.testFloatFill | 266 | 104361.908 | 106769.11 | 1.023065906 >> ArraysFill.testFloatFill | 511 | 59063.325 | 73726.544 | 1.248262674 >> ArraysFill.testFloatFill | 1021 | 46426.631 | 44255.239 | 0.953229602 >> ArraysFill.testFloatFill | 2047 | 23853.72 | 24988.53 | 1.047573712 >> ArraysFill.testFloatFill | 2048 | 23774.697 | 24723.921 | 1.039925809 >> ArraysFill.testFloatFill | 4095 | 11879.115 | 12574.113 | 1.058505874 >> ArraysFill.testFloatFill | 8195 | 6288.73 | 6309.257 | 1.003264093 >> ArraysFill.testIntFill | 10 | 202623.377 | 370696.239 | 1.829484063 >> ArraysFill.testIntFill | 16 | 187487.425 | 342203.932 | 1.825210048 >> ArraysFill.testIntFill | 31 | 107876.62 | 323291.016 | 2.996858967 >> ArraysFill.testIntFill | 59 | 76540.074 | 177755.374 | 2.322383096 >> ArraysFill.testIntFill | 89 | 77088.258 | 168496.776 | 2.185764478 >> ArraysFill.testIntFill | 126 | 92532.969 | 150986.404 | 1.631703874 >> ArraysFill.testIntFill | 250 | 99993.079 | 106098.703 | 1.061060466 >> ArraysFill.testIntFill | 266 | 105121.5 | 106809.473 | 1.016057353 >> ArraysFill.testIntFill | 511 | 61711.338 | 84318.27 | 1.366333525 >> ArraysFill.testIntFill | 1021 | 45725.648 | 44835.618 | 0.980535432 >> ArraysFill.testIntFill | 2047 | 24130.633 | 25001.727 | 1.036099094 >> ArraysFill.testIntFill | 2048 | 23873.255 | 24980.662 | 1.04638693 >> ArraysFill.testIntFill | 4095 | 12459.376 | 12666.815 | 1.016649229 >> ArraysFill.testIntFill | 8195 | 6303.873 | 6298.852 | 0.999203506 >> ArraysFill.testLongFill | 10 | 221803.338 | 203110.868 | 0.915725028 >> ArraysFill.testLongFill | 16 | 214013.975 | 230463.726 | 1.076862976 >> ArraysFill.testLongFill | 31 | 153858.758 | 144465.921 | 0.938951561 >> ArraysFill.testLongFill | 59 | 102187.914 | 112064.383 | 1.09665007 >> ArraysFill.testLongFill | 89 | 111940.314 | 107757.211 | 0.962630952 >> ArraysFill.testLongFill | 126 | 137992.49 | 110879.813 | 0.803520634 >> ArraysFill.testLongFill | 250 | 96629.877 | 96195.678 | 0.995506576 >> ArraysFill.testLongFill | 266 | 83984.403 | 86152.382 | 1.025814067 >> ArraysFill.testLongFill | 511 | 48698.933 | 48534.404 | 0.996621507 >> ArraysFill.testLongFill | 1021 | 25178.805 | 25162.502 | 0.999352511 >> ArraysFill.testLongFill | 2047 | 12511.142 | 12652.489 | 1.01129769 >> ArraysFill.testLongFill | 2048 | 12592.614 | 12622.317 | 1.002358764 >> ArraysFill.testLongFill | 4095 | 6377.694 | 6378.312 | 1.0000969 >> ArraysFill.testLongFill | 8195 | 885.065 | 884.811 | 0.999713015 >> ArraysFill.testShortFill | 10 | 196799.048 | 399963.161 | 2.032342966 >> ArraysFill.testShortFill | 16 | 191981.455 | 381173.675 | 1.985471331 >> ArraysFill.testShortFill | 31 | 98647.156 | 370750.549 | 3.758350104 >> ArraysFill.testShortFill | 59 | 79046.737 | 310586.902 | 3.929155254 >> ArraysFill.testShortFill | 89 | 128874.522 | 186302.59 | 1.445612268 >> ArraysFill.testShortFill | 126 | 47243.773 | 177947.204 | 3.766574782 >> ArraysFill.testShortFill | 250 | 37506.377 | 152968.336 | 4.078462071 >> ArraysFill.testShortFill | 266 | 41782.87 | 169466.305 | 4.055879958 >> ArraysFill.testShortFill | 511 | 44061.823 | 109352.795 | 2.481803692 >> ArraysFill.testShortFill | 1021 | 28799.157 | 81115.934 | 2.816607931 >> ArraysFill.testShortFill | 2047 | 38667.85 | 38998.02 | 1.008538618 >> ArraysFill.testShortFill | 2048 | 36626.321 | 38995.272 | 1.064678923 >> ArraysFill.testShortFill | 4095 | 16606.53 | 24724.43 | 1.488837825 >> ArraysFill.testShortFill | 8195 | 11679.891 | 12627.519 | 1.081133291 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8275047: Review comments resolution. This patch optimizes fill stub for AVX-512, to avoid call over head penalty we need to partially inline small fills which can fit in one full/partial vector. Will be sending a separate patch for it. ------------- PR: https://git.openjdk.java.net/jdk/pull/5967 From dlong at openjdk.java.net Thu Oct 21 19:06:10 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 21 Oct 2021 19:06:10 GMT Subject: RFR: 8275347 ciReplay: staticfield lines not properly terminated In-Reply-To: References: Message-ID: On Thu, 21 Oct 2021 04:11:32 GMT, Dean Long wrote: > Summary of changes: > - fix staticfield line termination for empty String constants > - tighten up checks during parse time to detect any un-parsed input at the end of a line > - updated test to trigger the problem, so that it fails without the fix > - cleanup: removed _buffer_pos field and related logic that isn't needed Thanks Nils and Christian! ------------- PR: https://git.openjdk.java.net/jdk/pull/6057 From dlong at openjdk.java.net Thu Oct 21 19:06:10 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 21 Oct 2021 19:06:10 GMT Subject: Integrated: 8275347 ciReplay: staticfield lines not properly terminated In-Reply-To: References: Message-ID: <795HGr_sNrGJGJUYZuhZgmrPS4x7GwHxvs0KNhTS7ho=.47d9da2a-4961-43f7-b9a0-c5f19437e1a6@github.com> On Thu, 21 Oct 2021 04:11:32 GMT, Dean Long wrote: > Summary of changes: > - fix staticfield line termination for empty String constants > - tighten up checks during parse time to detect any un-parsed input at the end of a line > - updated test to trigger the problem, so that it fails without the fix > - cleanup: removed _buffer_pos field and related logic that isn't needed This pull request has now been integrated. Changeset: 0961de47 Author: Dean Long URL: https://git.openjdk.java.net/jdk/commit/0961de47de1bf4379089e010978bcb4708fde767 Stats: 40 lines in 3 files changed: 24 ins; 7 del; 9 mod 8275347: ciReplay: staticfield lines not properly terminated Reviewed-by: neliasso, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/6057 From kvn at openjdk.java.net Thu Oct 21 20:13:06 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 21 Oct 2021 20:13:06 GMT Subject: RFR: 8275645: [JVMCI] avoid unaligned volatile reads on AArch64 In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 14:23:19 GMT, Doug Simon wrote: > This PR updates c2v-readFieldValue to always do the field read with volatile semantics but without using a volatile read instruction directly to avoid platform specific issues with unaligned reads (e.g., an unaligned ldar on AArch64 causes a SIGBUS). Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6044 From kvn at openjdk.java.net Thu Oct 21 20:27:01 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 21 Oct 2021 20:27:01 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v4] In-Reply-To: References: <0WViJD0Uip3CjSkJq6T1wxmgGTk4vbo6IngCAyBuc34=.997a2434-e648-40ef-a8f0-fe1669387fb6@github.com> Message-ID: On Thu, 21 Oct 2021 15:17:04 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8275047: Review comments resolution. > > This patch optimizes fill stub for AVX-512, to avoid call over head penalty we need to partially inline small fills which can fit in one full/partial vector. Will be sending a separate patch for it. @jatin-bhateja You need explicitly state in RFE and this PR that `Long` and `Double` are not covered to avoid confusion. ------------- PR: https://git.openjdk.java.net/jdk/pull/5967 From kvn at openjdk.java.net Thu Oct 21 20:54:09 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 21 Oct 2021 20:54:09 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v4] In-Reply-To: <0WViJD0Uip3CjSkJq6T1wxmgGTk4vbo6IngCAyBuc34=.997a2434-e648-40ef-a8f0-fe1669387fb6@github.com> References: <0WViJD0Uip3CjSkJq6T1wxmgGTk4vbo6IngCAyBuc34=.997a2434-e648-40ef-a8f0-fe1669387fb6@github.com> Message-ID: On Thu, 21 Oct 2021 14:50:31 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes macro assembly routines used by fill stubs of various primitive types for X86 AVX-512 target. >> Following are the main changes:- >> 1) Specialized instruction sequence for fill operation over various block sizes. >> 2) Control flow is sensitive to AVX3Threshold and generated code operates over 32 byte vector (YMM) if >> block size is less than threshold else instructions operate of 64 byte vector (ZMM). >> 3) Bulk fill operation is performed by a destination aligned fill loop with appropriate unroll factor, this >> avoids any cache line split penalty and improves performance. >> 4) Currently fill patterns are vectorized by auto-vectorizer and generated code operates over vectors >> of MaxVectorSize, in addition auto-vectorizer is oblivious to AVX3Thresholds and this may result into >> performance degradation over prior generation of X86 servers where 64 byte vector stores using ZMM >> registers operates at reduced CPU frequency. >> Patch enables JVM runtime flag -XX:+OptimizedFill ON by default for X86 target supporting AVX-512 feature. >> 5) Patch also optimizes the mask generation sequence of fill* macro assembly routines using BZHI instruction. >> >> Performance measurements of an existing JMH micro over Icelake server shows ~1.1-4.0X gains for fill operation with varying block sizes. >> >> Following are detailed results: >> >> System Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S) >> >> Benchmark | Size | Baseline Auto-vectorized -XX:-OptimizeFill (ops/ms) | New Optimized Fill AVX3 Th=4096 (ops/ms) | Gain Factor (OptFill AVX3Th=4096/Baseline) >> -- | -- | -- | -- | -- >> ArraysFill.testByteFill | 10 | 208480.451 | 399980.93 | 1.918553649 >> ArraysFill.testByteFill | 16 | 193927.021 | 381156.448 | 1.965463328 >> ArraysFill.testByteFill | 31 | 99175.805 | 399990.605 | 4.033147046 >> ArraysFill.testByteFill | 59 | 141430.876 | 342233.497 | 2.419793377 >> ArraysFill.testByteFill | 89 | 82091.504 | 342232.822 | 4.168918893 >> ArraysFill.testByteFill | 126 | 72154.769 | 310536.053 | 4.303749528 >> ArraysFill.testByteFill | 250 | 18986.775 | 158263.189 | 8.335443434 >> ArraysFill.testByteFill | 266 | 30057.331 | 166819.658 | 5.550048938 >> ArraysFill.testByteFill | 511 | 30094.92 | 116800.155 | 3.88105883 >> ArraysFill.testByteFill | 1021 | 38467.507 | 89235.56 | 2.319764574 >> ArraysFill.testByteFill | 2047 | 32267.535 | 70625.015 | 2.188732886 >> ArraysFill.testByteFill | 2048 | 25503.489 | 64848.532 | 2.542731781 >> ArraysFill.testByteFill | 4095 | 22432.636 | 42449.149 | 1.892294289 >> ArraysFill.testByteFill | 8195 | 16468.923 | 24810.485 | 1.506503188 >> ArraysFill.testCharFill | 10 | 221038.566 | 400005.661 | 1.809664568 >> ArraysFill.testCharFill | 16 | 209138.43 | 381171.236 | 1.822578643 >> ArraysFill.testCharFill | 31 | 93139.021 | 376441.98 | 4.041721461 >> ArraysFill.testCharFill | 59 | 63575.554 | 310559.54 | 4.884889245 >> ArraysFill.testCharFill | 89 | 61900.064 | 191445.936 | 3.092822909 >> ArraysFill.testCharFill | 126 | 36854.615 | 164187.37 | 4.455001633 >> ArraysFill.testCharFill | 250 | 37991.306 | 138797.511 | 3.653401939 >> ArraysFill.testCharFill | 266 | 44459.522 | 170334.083 | 3.831217146 >> ArraysFill.testCharFill | 511 | 52275.926 | 103012.53 | 1.970553903 >> ArraysFill.testCharFill | 1021 | 51803.73 | 80187.107 | 1.547902188 >> ArraysFill.testCharFill | 2047 | 35820.742 | 38973.828 | 1.088024028 >> ArraysFill.testCharFill | 2048 | 35280.779 | 38209.361 | 1.083007861 >> ArraysFill.testCharFill | 4095 | 21053.869 | 25006.99 | 1.187762211 >> ArraysFill.testCharFill | 8195 | 11419.785 | 12662.777 | 1.108845482 >> ArraysFill.testDoubleFill | 10 | 266086.021 | 220036.789 | 0.826938552 >> ArraysFill.testDoubleFill | 16 | 216597.316 | 218875.135 | 1.010516377 >> ArraysFill.testDoubleFill | 31 | 151868.92 | 174250.587 | 1.147374901 >> ArraysFill.testDoubleFill | 59 | 196480.253 | 194467.527 | 0.98975609 >> ArraysFill.testDoubleFill | 89 | 109787.976 | 102698.432 | 0.935425133 >> ArraysFill.testDoubleFill | 126 | 93945.51 | 121697.956 | 1.295410031 >> ArraysFill.testDoubleFill | 250 | 97830.626 | 81429.644 | 0.832353296 >> ArraysFill.testDoubleFill | 266 | 83560.898 | 91313.593 | 1.092778981 >> ArraysFill.testDoubleFill | 511 | 48710.087 | 48145.392 | 0.988407021 >> ArraysFill.testDoubleFill | 1021 | 25145.002 | 25163.03 | 1.000716962 >> ArraysFill.testDoubleFill | 2047 | 12665.468 | 12639.651 | 0.997961623 >> ArraysFill.testDoubleFill | 2048 | 12202.183 | 12619.316 | 1.034185113 >> ArraysFill.testDoubleFill | 4095 | 6319.101 | 6320.488 | 1.000219493 >> ArraysFill.testDoubleFill | 8195 | 882.585 | 883.727 | 1.001293926 >> ArraysFill.testFloatFill | 10 | 193690.976 | 370572.639 | 1.913215818 >> ArraysFill.testFloatFill | 16 | 178498.07 | 342227.406 | 1.9172611 >> ArraysFill.testFloatFill | 31 | 160406.649 | 323327.925 | 2.015676576 >> ArraysFill.testFloatFill | 59 | 119643.034 | 177091.185 | 1.48016294 >> ArraysFill.testFloatFill | 89 | 64783.111 | 168280.961 | 2.597605431 >> ArraysFill.testFloatFill | 126 | 85291.623 | 152788.86 | 1.791370062 >> ArraysFill.testFloatFill | 250 | 98864.197 | 115429.942 | 1.167560608 >> ArraysFill.testFloatFill | 266 | 104361.908 | 106769.11 | 1.023065906 >> ArraysFill.testFloatFill | 511 | 59063.325 | 73726.544 | 1.248262674 >> ArraysFill.testFloatFill | 1021 | 46426.631 | 44255.239 | 0.953229602 >> ArraysFill.testFloatFill | 2047 | 23853.72 | 24988.53 | 1.047573712 >> ArraysFill.testFloatFill | 2048 | 23774.697 | 24723.921 | 1.039925809 >> ArraysFill.testFloatFill | 4095 | 11879.115 | 12574.113 | 1.058505874 >> ArraysFill.testFloatFill | 8195 | 6288.73 | 6309.257 | 1.003264093 >> ArraysFill.testIntFill | 10 | 202623.377 | 370696.239 | 1.829484063 >> ArraysFill.testIntFill | 16 | 187487.425 | 342203.932 | 1.825210048 >> ArraysFill.testIntFill | 31 | 107876.62 | 323291.016 | 2.996858967 >> ArraysFill.testIntFill | 59 | 76540.074 | 177755.374 | 2.322383096 >> ArraysFill.testIntFill | 89 | 77088.258 | 168496.776 | 2.185764478 >> ArraysFill.testIntFill | 126 | 92532.969 | 150986.404 | 1.631703874 >> ArraysFill.testIntFill | 250 | 99993.079 | 106098.703 | 1.061060466 >> ArraysFill.testIntFill | 266 | 105121.5 | 106809.473 | 1.016057353 >> ArraysFill.testIntFill | 511 | 61711.338 | 84318.27 | 1.366333525 >> ArraysFill.testIntFill | 1021 | 45725.648 | 44835.618 | 0.980535432 >> ArraysFill.testIntFill | 2047 | 24130.633 | 25001.727 | 1.036099094 >> ArraysFill.testIntFill | 2048 | 23873.255 | 24980.662 | 1.04638693 >> ArraysFill.testIntFill | 4095 | 12459.376 | 12666.815 | 1.016649229 >> ArraysFill.testIntFill | 8195 | 6303.873 | 6298.852 | 0.999203506 >> ArraysFill.testLongFill | 10 | 221803.338 | 203110.868 | 0.915725028 >> ArraysFill.testLongFill | 16 | 214013.975 | 230463.726 | 1.076862976 >> ArraysFill.testLongFill | 31 | 153858.758 | 144465.921 | 0.938951561 >> ArraysFill.testLongFill | 59 | 102187.914 | 112064.383 | 1.09665007 >> ArraysFill.testLongFill | 89 | 111940.314 | 107757.211 | 0.962630952 >> ArraysFill.testLongFill | 126 | 137992.49 | 110879.813 | 0.803520634 >> ArraysFill.testLongFill | 250 | 96629.877 | 96195.678 | 0.995506576 >> ArraysFill.testLongFill | 266 | 83984.403 | 86152.382 | 1.025814067 >> ArraysFill.testLongFill | 511 | 48698.933 | 48534.404 | 0.996621507 >> ArraysFill.testLongFill | 1021 | 25178.805 | 25162.502 | 0.999352511 >> ArraysFill.testLongFill | 2047 | 12511.142 | 12652.489 | 1.01129769 >> ArraysFill.testLongFill | 2048 | 12592.614 | 12622.317 | 1.002358764 >> ArraysFill.testLongFill | 4095 | 6377.694 | 6378.312 | 1.0000969 >> ArraysFill.testLongFill | 8195 | 885.065 | 884.811 | 0.999713015 >> ArraysFill.testShortFill | 10 | 196799.048 | 399963.161 | 2.032342966 >> ArraysFill.testShortFill | 16 | 191981.455 | 381173.675 | 1.985471331 >> ArraysFill.testShortFill | 31 | 98647.156 | 370750.549 | 3.758350104 >> ArraysFill.testShortFill | 59 | 79046.737 | 310586.902 | 3.929155254 >> ArraysFill.testShortFill | 89 | 128874.522 | 186302.59 | 1.445612268 >> ArraysFill.testShortFill | 126 | 47243.773 | 177947.204 | 3.766574782 >> ArraysFill.testShortFill | 250 | 37506.377 | 152968.336 | 4.078462071 >> ArraysFill.testShortFill | 266 | 41782.87 | 169466.305 | 4.055879958 >> ArraysFill.testShortFill | 511 | 44061.823 | 109352.795 | 2.481803692 >> ArraysFill.testShortFill | 1021 | 28799.157 | 81115.934 | 2.816607931 >> ArraysFill.testShortFill | 2047 | 38667.85 | 38998.02 | 1.008538618 >> ArraysFill.testShortFill | 2048 | 36626.321 | 38995.272 | 1.064678923 >> ArraysFill.testShortFill | 4095 | 16606.53 | 24724.43 | 1.488837825 >> ArraysFill.testShortFill | 8195 | 11679.891 | 12627.519 | 1.081133291 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8275047: Review comments resolution. In general looks good. I have few comments. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5211: > 5209: > 5210: #ifdef COMPILER2 > 5211: #ifdef _LP64 You can combine this lines: `#if defined(_LP64) && defined(COMPILER2)` src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5214: > 5212: if(UseAVX > 2 && > 5213: MaxVectorSize >=32 && > 5214: VM_Version::supports_avx512vlbw() && You don't need to check `UseAVX > 2` and `supports_avx512vlbw()` the same time - the feature will be off: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L757 I think, it is fine just to check `supports_avx512vlbw()`. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8290: > 8288: LP64_ONLY(mov64(temp, -1L)) NOT_LP64(movl(temp, -1)); > 8289: bzhiq(temp, temp, length); > 8290: kmov(mask, temp); Need comment for this sequence to explain what it does. Why is the different `-1` assignment instruction depending on bitness? Also we have such method already: `MacroAssembler::movptr(Register dst, intptr_t src)` Also this part of code matches code in `fill32_masked_avx()`. Only difference is `AVX_256bit` vs `AVX_512bit ` which you can pass as parameter. Can you refactor it? ------------- PR: https://git.openjdk.java.net/jdk/pull/5967 From duke at openjdk.java.net Fri Oct 22 00:42:22 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Fri, 22 Oct 2021 00:42:22 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE Message-ID: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> After JDK-8269559 was integrated there are failures in tier1 testing across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. This patch is NOT functional; rather, this tends to verify potential toolchain issues as the original patch pass testing on other platforms. In this patch, we remove new SVE-related matching rules and register class introduced in the original patch to minimally affect the non-SVE part. ------------- Commit messages: - 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE Changes: https://git.openjdk.java.net/jdk/pull/6072/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6072&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275448 Stats: 104 lines in 6 files changed: 97 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/6072.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6072/head:pull/6072 PR: https://git.openjdk.java.net/jdk/pull/6072 From thartmann at openjdk.java.net Fri Oct 22 07:07:08 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 22 Oct 2021 07:07:08 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v12] In-Reply-To: References: Message-ID: On Thu, 21 Oct 2021 06:50:41 GMT, Roland Westrelin wrote: >> JDK-8255150 makes it possible for java code to explicitly perform a >> range check on long values. JDK-8223051 provides a transformation of >> long counted loops into loop nests with an inner int counted >> loop. With this change I propose transforming long range checks that >> operate on the iv of a long counted loop into range checks that >> operate on the iv of the int inner loop once it has been >> created. Existing range check eliminations can then kick in. >> >> Transformation of range checks is piggy backed on the loop nest >> creation for 2 reasons: >> >> - pattern matching range checks is easier right before the loop nest >> is created >> >> - the number of iterations of the inner loop is adjusted so scale * >> inner_iv doesn't overflow >> >> C2 has logic to delay some split if transformations so they don't >> break the scale * iv + offset pattern. I reused that logic for long >> range checks and had to relax what's considered a range check because >> initially a range check from Object.checkIndex() may include a test >> for range > 0 that needs a round of loop opts to be hoisted. I realize >> there's some code duplication but I didn't see a way to share logic >> between IdealLoopTree::may_have_range_check() >> IdealLoopTree::policy_range_check() that would feel right. >> >> I realize the comment in PhaseIdealLoop::transform_long_range_checks() >> is scary. FWIW, it's not as complicated as it looks. I found drawing >> the range covered by the entire long loop and the range covered by the >> inner loop help see how range checks can be transformed. Then the >> comment helps make sure all cases are covered and verify the generated >> code actually covers all of them. >> >> One issue is overflow. I think the fact that inner_iv * scale doesn't >> overflow helps simplify thing. One possible overflow is that of scale >> * upper + offset which is handled by forcing all range checks in that >> case to deoptimize. I don't think other case of overflow needs special >> handling. >> >> This was tested with a Memory Segment micro benchmark (and patched >> Memory Segment support to take advantage of the new checkIndex >> intrinsic, both provided by Maurizio). Range checks in the micro >> benchmark are properly optimized (and performance increases >> significantly). > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > build fix Looks good to me overall but I did not verify the `transform_long_range_checks` logic in detail. I gave this a good amount of testing in our infra (tier 1-6). All green. src/hotspot/share/opto/loopnode.hpp line 1657: > 1655: void try_sink_out_of_loop(Node* n); > 1656: > 1657: Node* clamp(Node* pNode, Node* pNode1, Node* pNode2); Argument naming is not consistent with the implementation. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2045 From thartmann at openjdk.java.net Fri Oct 22 07:07:09 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 22 Oct 2021 07:07:09 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v3] In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 09:55:50 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - min_jint overflow fix >> - Revert "assert(static_cast(result) == thing) fix" >> >> This reverts commit e234477df097475d503ea6f94ab6a258132d165e. > > src/hotspot/share/opto/loopTransform.cpp line 2584: > >> 2582: if (p_offset != NULL) { >> 2583: if (*p_scale == min_signed_integer(bt)) { >> 2584: return false; > > I find it suspicious that this edge case needs to be handled here. Could you explain why and add a corresponding comment? Looks like you forgot to address this comment? ------------- PR: https://git.openjdk.java.net/jdk/pull/2045 From njian at openjdk.java.net Fri Oct 22 07:36:02 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 22 Oct 2021 07:36:02 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE In-Reply-To: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: On Fri, 22 Oct 2021 00:34:03 GMT, TatWai Chong wrote: > After JDK-8269559 was integrated there are failures in tier1 testing > across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. > > This patch is NOT functional; rather, this tends to verify potential > toolchain issues as the original patch pass testing on other > platforms. > > In this patch, we remove new SVE-related matching rules and register > class introduced in the original patch to minimally affect the > non-SVE part. Hi @dholmes-ora , could you please help to run this patch in Oracle test system to see whether there's still the same issue as JDK-8275263 ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From duke at openjdk.java.net Fri Oct 22 08:12:27 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Fri, 22 Oct 2021 08:12:27 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled [v2] In-Reply-To: References: Message-ID: > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5903/files - new: https://git.openjdk.java.net/jdk/pull/5903/files/ef103774..9ffdbeaf Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5903&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5903&range=00-01 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5903.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5903/head:pull/5903 PR: https://git.openjdk.java.net/jdk/pull/5903 From duke at openjdk.java.net Fri Oct 22 08:24:06 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Fri, 22 Oct 2021 08:24:06 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled [v2] In-Reply-To: References: Message-ID: On Fri, 22 Oct 2021 08:12:27 GMT, SUN Guoyun wrote: >> Hi all, >> Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: >> >>

>> One or more @IR rules failed:
>> 
>> Failed IR Rules (1)
>> ------------------
>> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>>     - failOn: Graph contains forbidden nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>>         Matched forbidden node:
>>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>>     - counts: Graph contains wrong number of nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>>         Expected 1 but found 0 nodes.
>> 
>>>>> Check stdout for compilation output of the failed methods
>> 
>> >> This is a patch to fix this problem. Please help review it. >> >> Thanks, >> Sun Guoyun > > SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: > > 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled Hi all, Please review the new patch, Thanks. I set `TieredCompilation=true` or `TieredCompilation=false && CompileThresholdScaling<0.14` to make testing to PASSED. ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From thartmann at openjdk.java.net Fri Oct 22 08:52:09 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 22 Oct 2021 08:52:09 GMT Subject: RFR: 8275104: IR framework does not handle client VM builds correctly [v2] In-Reply-To: References: Message-ID: On Thu, 21 Oct 2021 07:04:29 GMT, Christian Hagedorn wrote: >> While the IR framework is primarily used for C2 IR verification, it should also work with client VM builds. There are currently some problems which are fixed with this patch: >> >> - The IR framework currently only bails out of IR matching if C2 is excluded by command line flags. However, when running an IR JTreg test with a client VM build, IR matching fails when not specifically adding `@requires vm.compiler2.enabled` to exclude the test. >> - `@Test` and `@ForceCompile` do not work correctly and throw an exception due to an incompatible compilation level selection without C2. >> - Some internal framework tests fail (the fix also improves `TestDIgnoreCompilerControls` in general). >> >> Testing: >> >> - Standard tier testing >> - Testing internal framework tests with standard build (tiered), client VM (without C2) and server VM build (without C1) >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Change MinInliningThreshold to TypeProfileLevel in IRExample Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6037 From whuang at openjdk.java.net Fri Oct 22 09:59:29 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 22 Oct 2021 09:59:29 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v12] In-Reply-To: References: Message-ID: > * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. > * It may be a solver of JDK-8269866, or part of it. Wang Huang has updated the pull request incrementally with one additional commit since the last revision: fix styles ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4839/files - new: https://git.openjdk.java.net/jdk/pull/4839/files/a7c562e5..f317a0be Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=11 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=10-11 Stats: 9 lines in 2 files changed: 6 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/4839.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4839/head:pull/4839 PR: https://git.openjdk.java.net/jdk/pull/4839 From thartmann at openjdk.java.net Fri Oct 22 11:51:11 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 22 Oct 2021 11:51:11 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE In-Reply-To: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: On Fri, 22 Oct 2021 00:34:03 GMT, TatWai Chong wrote: > After JDK-8269559 was integrated there are failures in tier1 testing > across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. > > This patch is NOT functional; rather, this tends to verify potential > toolchain issues as the original patch pass testing on other > platforms. > > In this patch, we remove new SVE-related matching rules and register > class introduced in the original patch to minimally affect the > non-SVE part. I submitted testing and will report back once it finished. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From chagedorn at openjdk.java.net Fri Oct 22 12:13:04 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 22 Oct 2021 12:13:04 GMT Subject: RFR: 8275104: IR framework does not handle client VM builds correctly [v2] In-Reply-To: References: Message-ID: On Thu, 21 Oct 2021 07:04:29 GMT, Christian Hagedorn wrote: >> While the IR framework is primarily used for C2 IR verification, it should also work with client VM builds. There are currently some problems which are fixed with this patch: >> >> - The IR framework currently only bails out of IR matching if C2 is excluded by command line flags. However, when running an IR JTreg test with a client VM build, IR matching fails when not specifically adding `@requires vm.compiler2.enabled` to exclude the test. >> - `@Test` and `@ForceCompile` do not work correctly and throw an exception due to an incompatible compilation level selection without C2. >> - Some internal framework tests fail (the fix also improves `TestDIgnoreCompilerControls` in general). >> >> Testing: >> >> - Standard tier testing >> - Testing internal framework tests with standard build (tiered), client VM (without C2) and server VM build (without C1) >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Change MinInliningThreshold to TypeProfileLevel in IRExample Thanks Tobias for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6037 From never at openjdk.java.net Fri Oct 22 16:19:07 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Fri, 22 Oct 2021 16:19:07 GMT Subject: RFR: 8275645: [JVMCI] avoid unaligned volatile reads on AArch64 In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 14:23:19 GMT, Doug Simon wrote: > This PR updates c2v_readFieldValue to always do the field read with volatile semantics but without using a volatile read instruction directly to avoid platform specific issues with unaligned reads (e.g., an unaligned ldar on AArch64 causes a SIGBUS). Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6044 From dnsimon at openjdk.java.net Fri Oct 22 16:24:07 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 22 Oct 2021 16:24:07 GMT Subject: RFR: 8275645: [JVMCI] avoid unaligned volatile reads on AArch64 In-Reply-To: References: Message-ID: On Thu, 21 Oct 2021 20:10:17 GMT, Vladimir Kozlov wrote: >> This PR updates c2v_readFieldValue to always do the field read with volatile semantics but without using a volatile read instruction directly to avoid platform specific issues with unaligned reads (e.g., an unaligned ldar on AArch64 causes a SIGBUS). > > Marked as reviewed by kvn (Reviewer). Thanks for the reviews @vnkozlov and @tkrodriguez. ------------- PR: https://git.openjdk.java.net/jdk/pull/6044 From dnsimon at openjdk.java.net Fri Oct 22 16:24:08 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 22 Oct 2021 16:24:08 GMT Subject: Integrated: 8275645: [JVMCI] avoid unaligned volatile reads on AArch64 In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 14:23:19 GMT, Doug Simon wrote: > This PR updates c2v_readFieldValue to always do the field read with volatile semantics but without using a volatile read instruction directly to avoid platform specific issues with unaligned reads (e.g., an unaligned ldar on AArch64 causes a SIGBUS). This pull request has now been integrated. Changeset: 4dec8fc4 Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/4dec8fc4cc2b1762fba554d0401da8be0d6d1166 Stats: 63 lines in 6 files changed: 41 ins; 1 del; 21 mod 8275645: [JVMCI] avoid unaligned volatile reads on AArch64 Reviewed-by: kvn, never ------------- PR: https://git.openjdk.java.net/jdk/pull/6044 From kvn at openjdk.java.net Fri Oct 22 17:18:09 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 22 Oct 2021 17:18:09 GMT Subject: RFR: 8275104: IR framework does not handle client VM builds correctly In-Reply-To: References: Message-ID: On Thu, 21 Oct 2021 07:01:06 GMT, Christian Hagedorn wrote: > Thanks Vladimir for your review! > > Given that 8273712 is going to deprecate `MinInliningThreshold`, I've changed the flag in `IRExample` to another int based flag: `TypeProfileLevel`. Okey. ------------- PR: https://git.openjdk.java.net/jdk/pull/6037 From eliu at openjdk.java.net Sun Oct 24 09:07:13 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Sun, 24 Oct 2021 09:07:13 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v12] In-Reply-To: References: Message-ID: On Fri, 22 Oct 2021 09:59:29 GMT, Wang Huang wrote: >> * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. >> * It may be a solver of JDK-8269866, or part of it. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix styles LGTM ------------- Marked as reviewed by eliu (Author). PR: https://git.openjdk.java.net/jdk/pull/4839 From jbhateja at openjdk.java.net Sun Oct 24 19:20:42 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 24 Oct 2021 19:20:42 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v5] In-Reply-To: References: Message-ID: > Hi All, > > This patch optimizes macro assembly routines used by fill stubs of various primitive types for X86 AVX-512 target. > Following are the main changes:- > 1) Specialized instruction sequence for fill operation over various block sizes. > 2) Control flow is sensitive to AVX3Threshold and generated code operates over 32 byte vector (YMM) if > block size is less than threshold else instructions operate of 64 byte vector (ZMM). > 3) Bulk fill operation is performed by a destination aligned fill loop with appropriate unroll factor, this > avoids any cache line split penalty and improves performance. > 4) Currently fill patterns are vectorized by auto-vectorizer and generated code operates over vectors > of MaxVectorSize, in addition auto-vectorizer is oblivious to AVX3Thresholds and this may result into > performance degradation over prior generation of X86 servers where 64 byte vector stores using ZMM > registers operates at reduced CPU frequency. > Patch enables JVM runtime flag -XX:+OptimizedFill ON by default for X86 target supporting AVX-512 feature. > 5) Patch also optimizes the mask generation sequence of fill* macro assembly routines using BZHI instruction. > > Performance measurements of an existing JMH micro over Icelake server shows ~1.1-4.0X gains for fill operation with varying block sizes. > > Following are detailed results: > > System Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S) > > Benchmark | Size | Baseline Auto-vectorized -XX:-OptimizeFill (ops/ms) | New Optimized Fill AVX3 Th=4096 (ops/ms) | Gain Factor (OptFill AVX3Th=4096/Baseline) > -- | -- | -- | -- | -- > ArraysFill.testByteFill | 10 | 208480.451 | 399980.93 | 1.918553649 > ArraysFill.testByteFill | 16 | 193927.021 | 381156.448 | 1.965463328 > ArraysFill.testByteFill | 31 | 99175.805 | 399990.605 | 4.033147046 > ArraysFill.testByteFill | 59 | 141430.876 | 342233.497 | 2.419793377 > ArraysFill.testByteFill | 89 | 82091.504 | 342232.822 | 4.168918893 > ArraysFill.testByteFill | 126 | 72154.769 | 310536.053 | 4.303749528 > ArraysFill.testByteFill | 250 | 18986.775 | 158263.189 | 8.335443434 > ArraysFill.testByteFill | 266 | 30057.331 | 166819.658 | 5.550048938 > ArraysFill.testByteFill | 511 | 30094.92 | 116800.155 | 3.88105883 > ArraysFill.testByteFill | 1021 | 38467.507 | 89235.56 | 2.319764574 > ArraysFill.testByteFill | 2047 | 32267.535 | 70625.015 | 2.188732886 > ArraysFill.testByteFill | 2048 | 25503.489 | 64848.532 | 2.542731781 > ArraysFill.testByteFill | 4095 | 22432.636 | 42449.149 | 1.892294289 > ArraysFill.testByteFill | 8195 | 16468.923 | 24810.485 | 1.506503188 > ArraysFill.testCharFill | 10 | 221038.566 | 400005.661 | 1.809664568 > ArraysFill.testCharFill | 16 | 209138.43 | 381171.236 | 1.822578643 > ArraysFill.testCharFill | 31 | 93139.021 | 376441.98 | 4.041721461 > ArraysFill.testCharFill | 59 | 63575.554 | 310559.54 | 4.884889245 > ArraysFill.testCharFill | 89 | 61900.064 | 191445.936 | 3.092822909 > ArraysFill.testCharFill | 126 | 36854.615 | 164187.37 | 4.455001633 > ArraysFill.testCharFill | 250 | 37991.306 | 138797.511 | 3.653401939 > ArraysFill.testCharFill | 266 | 44459.522 | 170334.083 | 3.831217146 > ArraysFill.testCharFill | 511 | 52275.926 | 103012.53 | 1.970553903 > ArraysFill.testCharFill | 1021 | 51803.73 | 80187.107 | 1.547902188 > ArraysFill.testCharFill | 2047 | 35820.742 | 38973.828 | 1.088024028 > ArraysFill.testCharFill | 2048 | 35280.779 | 38209.361 | 1.083007861 > ArraysFill.testCharFill | 4095 | 21053.869 | 25006.99 | 1.187762211 > ArraysFill.testCharFill | 8195 | 11419.785 | 12662.777 | 1.108845482 > ArraysFill.testDoubleFill | 10 | 266086.021 | 220036.789 | 0.826938552 > ArraysFill.testDoubleFill | 16 | 216597.316 | 218875.135 | 1.010516377 > ArraysFill.testDoubleFill | 31 | 151868.92 | 174250.587 | 1.147374901 > ArraysFill.testDoubleFill | 59 | 196480.253 | 194467.527 | 0.98975609 > ArraysFill.testDoubleFill | 89 | 109787.976 | 102698.432 | 0.935425133 > ArraysFill.testDoubleFill | 126 | 93945.51 | 121697.956 | 1.295410031 > ArraysFill.testDoubleFill | 250 | 97830.626 | 81429.644 | 0.832353296 > ArraysFill.testDoubleFill | 266 | 83560.898 | 91313.593 | 1.092778981 > ArraysFill.testDoubleFill | 511 | 48710.087 | 48145.392 | 0.988407021 > ArraysFill.testDoubleFill | 1021 | 25145.002 | 25163.03 | 1.000716962 > ArraysFill.testDoubleFill | 2047 | 12665.468 | 12639.651 | 0.997961623 > ArraysFill.testDoubleFill | 2048 | 12202.183 | 12619.316 | 1.034185113 > ArraysFill.testDoubleFill | 4095 | 6319.101 | 6320.488 | 1.000219493 > ArraysFill.testDoubleFill | 8195 | 882.585 | 883.727 | 1.001293926 > ArraysFill.testFloatFill | 10 | 193690.976 | 370572.639 | 1.913215818 > ArraysFill.testFloatFill | 16 | 178498.07 | 342227.406 | 1.9172611 > ArraysFill.testFloatFill | 31 | 160406.649 | 323327.925 | 2.015676576 > ArraysFill.testFloatFill | 59 | 119643.034 | 177091.185 | 1.48016294 > ArraysFill.testFloatFill | 89 | 64783.111 | 168280.961 | 2.597605431 > ArraysFill.testFloatFill | 126 | 85291.623 | 152788.86 | 1.791370062 > ArraysFill.testFloatFill | 250 | 98864.197 | 115429.942 | 1.167560608 > ArraysFill.testFloatFill | 266 | 104361.908 | 106769.11 | 1.023065906 > ArraysFill.testFloatFill | 511 | 59063.325 | 73726.544 | 1.248262674 > ArraysFill.testFloatFill | 1021 | 46426.631 | 44255.239 | 0.953229602 > ArraysFill.testFloatFill | 2047 | 23853.72 | 24988.53 | 1.047573712 > ArraysFill.testFloatFill | 2048 | 23774.697 | 24723.921 | 1.039925809 > ArraysFill.testFloatFill | 4095 | 11879.115 | 12574.113 | 1.058505874 > ArraysFill.testFloatFill | 8195 | 6288.73 | 6309.257 | 1.003264093 > ArraysFill.testIntFill | 10 | 202623.377 | 370696.239 | 1.829484063 > ArraysFill.testIntFill | 16 | 187487.425 | 342203.932 | 1.825210048 > ArraysFill.testIntFill | 31 | 107876.62 | 323291.016 | 2.996858967 > ArraysFill.testIntFill | 59 | 76540.074 | 177755.374 | 2.322383096 > ArraysFill.testIntFill | 89 | 77088.258 | 168496.776 | 2.185764478 > ArraysFill.testIntFill | 126 | 92532.969 | 150986.404 | 1.631703874 > ArraysFill.testIntFill | 250 | 99993.079 | 106098.703 | 1.061060466 > ArraysFill.testIntFill | 266 | 105121.5 | 106809.473 | 1.016057353 > ArraysFill.testIntFill | 511 | 61711.338 | 84318.27 | 1.366333525 > ArraysFill.testIntFill | 1021 | 45725.648 | 44835.618 | 0.980535432 > ArraysFill.testIntFill | 2047 | 24130.633 | 25001.727 | 1.036099094 > ArraysFill.testIntFill | 2048 | 23873.255 | 24980.662 | 1.04638693 > ArraysFill.testIntFill | 4095 | 12459.376 | 12666.815 | 1.016649229 > ArraysFill.testIntFill | 8195 | 6303.873 | 6298.852 | 0.999203506 > ArraysFill.testLongFill | 10 | 221803.338 | 203110.868 | 0.915725028 > ArraysFill.testLongFill | 16 | 214013.975 | 230463.726 | 1.076862976 > ArraysFill.testLongFill | 31 | 153858.758 | 144465.921 | 0.938951561 > ArraysFill.testLongFill | 59 | 102187.914 | 112064.383 | 1.09665007 > ArraysFill.testLongFill | 89 | 111940.314 | 107757.211 | 0.962630952 > ArraysFill.testLongFill | 126 | 137992.49 | 110879.813 | 0.803520634 > ArraysFill.testLongFill | 250 | 96629.877 | 96195.678 | 0.995506576 > ArraysFill.testLongFill | 266 | 83984.403 | 86152.382 | 1.025814067 > ArraysFill.testLongFill | 511 | 48698.933 | 48534.404 | 0.996621507 > ArraysFill.testLongFill | 1021 | 25178.805 | 25162.502 | 0.999352511 > ArraysFill.testLongFill | 2047 | 12511.142 | 12652.489 | 1.01129769 > ArraysFill.testLongFill | 2048 | 12592.614 | 12622.317 | 1.002358764 > ArraysFill.testLongFill | 4095 | 6377.694 | 6378.312 | 1.0000969 > ArraysFill.testLongFill | 8195 | 885.065 | 884.811 | 0.999713015 > ArraysFill.testShortFill | 10 | 196799.048 | 399963.161 | 2.032342966 > ArraysFill.testShortFill | 16 | 191981.455 | 381173.675 | 1.985471331 > ArraysFill.testShortFill | 31 | 98647.156 | 370750.549 | 3.758350104 > ArraysFill.testShortFill | 59 | 79046.737 | 310586.902 | 3.929155254 > ArraysFill.testShortFill | 89 | 128874.522 | 186302.59 | 1.445612268 > ArraysFill.testShortFill | 126 | 47243.773 | 177947.204 | 3.766574782 > ArraysFill.testShortFill | 250 | 37506.377 | 152968.336 | 4.078462071 > ArraysFill.testShortFill | 266 | 41782.87 | 169466.305 | 4.055879958 > ArraysFill.testShortFill | 511 | 44061.823 | 109352.795 | 2.481803692 > ArraysFill.testShortFill | 1021 | 28799.157 | 81115.934 | 2.816607931 > ArraysFill.testShortFill | 2047 | 38667.85 | 38998.02 | 1.008538618 > ArraysFill.testShortFill | 2048 | 36626.321 | 38995.272 | 1.064678923 > ArraysFill.testShortFill | 4095 | 16606.53 | 24724.43 | 1.488837825 > ArraysFill.testShortFill | 8195 | 11679.891 | 12627.519 | 1.081133291 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - 8275047: Review comments resolution. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8275047 - 8275047: Review comments resolution. - 8275047: Aligning the main fill loops and some synthetic changes. - 8275047: Review comments resolved. - 8275047: Optimize existing fill stubs for AVX-512 target ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5967/files - new: https://git.openjdk.java.net/jdk/pull/5967/files/1e8d5434..b5f51ecb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5967&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5967&range=03-04 Stats: 22584 lines in 547 files changed: 17529 ins; 3396 del; 1659 mod Patch: https://git.openjdk.java.net/jdk/pull/5967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5967/head:pull/5967 PR: https://git.openjdk.java.net/jdk/pull/5967 From jbhateja at openjdk.java.net Sun Oct 24 19:20:44 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 24 Oct 2021 19:20:44 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v4] In-Reply-To: References: <0WViJD0Uip3CjSkJq6T1wxmgGTk4vbo6IngCAyBuc34=.997a2434-e648-40ef-a8f0-fe1669387fb6@github.com> Message-ID: On Thu, 21 Oct 2021 20:23:56 GMT, Vladimir Kozlov wrote: >> This patch optimizes fill stub for AVX-512, to avoid call over head penalty we need to partially inline small fills which can fit in one full/partial vector. Will be sending a separate patch for it. > > @jatin-bhateja You need explicitly state in RFE and this PR that `Long` and `Double` are not covered to avoid confusion. @vnkozlov , comments addressed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5967 From yyang at openjdk.java.net Mon Oct 25 05:59:17 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 25 Oct 2021 05:59:17 GMT Subject: RFR: 8273585: String.charAt performance degrades due to JDK-8268698 Message-ID: String.charAt shows significant performance regression due to [JDK-8268698](https://bugs.openjdk.java.net/browse/JDK-8268698), which replaces index bound checking with Preconditions.checkIndex intrinsic method. The result of "time linux-x86_64-server-release/images/jdk/bin/java Test": Before JDK-8268698 real 0m8.369s user 0m8.386s sys 0m0.019s After JDK-8268698, real 0m19.722s user 0m19.748s sys 0m0.013s The reason is Preconditions.checkIndex generates a CastII for index node as index is now known to be >= 0 and < length.: https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/library_call.cpp#L1077-L1083 CastII can not be recognized as a parallel induction variable because AddNode's input must be the PhiNode: https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/loopnode.cpp#L3177-L3184 It seems this prevents further loop unrolling. I think we can relax this constraint, i.e CastII can be the input of AddNode if its input is PhiNode. After applying this patch, performance regression disappears: $time ./test.sh real 0m9.514s user 0m10.310s sys 0m0.155s This is likely the reason for [JDK-8272493](https://bugs.openjdk.java.net/browse/JDK-8272493). Please help review it. Thanks! ------------- Commit messages: - 8273585: String.charAt performance degrades due to JDK-8268698 Changes: https://git.openjdk.java.net/jdk/pull/6096/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6096&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273585 Stats: 11 lines in 1 file changed: 9 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6096.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6096/head:pull/6096 PR: https://git.openjdk.java.net/jdk/pull/6096 From yyang at openjdk.java.net Mon Oct 25 06:02:07 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 25 Oct 2021 06:02:07 GMT Subject: RFR: 8274328: C2: Redundant CFG edges fixup in block ordering In-Reply-To: References: Message-ID: On Sun, 26 Sep 2021 10:40:43 GMT, Yi Yang wrote: > I think Trace::fixup_blocks is redundant because PhaseCFG::fixup_flow will nevertheless fix up the CFG flow(i.e. flip successor blocks of IfNode) right after PhaseBlockLayout pass, we can remove this step when doing PhaseBlockLayout pass.(Testing: jtreg/compiler/c2, presubmit test) > > https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/compile.cpp#L2765 > > https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/block.cpp#L1679 > > https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/block.cpp#L908-L916 PING :- ------------- PR: https://git.openjdk.java.net/jdk/pull/5705 From chagedorn at openjdk.java.net Mon Oct 25 07:26:03 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 25 Oct 2021 07:26:03 GMT Subject: RFR: 8275104: IR framework does not handle client VM builds correctly [v2] In-Reply-To: References: Message-ID: On Thu, 21 Oct 2021 07:04:29 GMT, Christian Hagedorn wrote: >> While the IR framework is primarily used for C2 IR verification, it should also work with client VM builds. There are currently some problems which are fixed with this patch: >> >> - The IR framework currently only bails out of IR matching if C2 is excluded by command line flags. However, when running an IR JTreg test with a client VM build, IR matching fails when not specifically adding `@requires vm.compiler2.enabled` to exclude the test. >> - `@Test` and `@ForceCompile` do not work correctly and throw an exception due to an incompatible compilation level selection without C2. >> - Some internal framework tests fail (the fix also improves `TestDIgnoreCompilerControls` in general). >> >> Testing: >> >> - Standard tier testing >> - Testing internal framework tests with standard build (tiered), client VM (without C2) and server VM build (without C1) >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Change MinInliningThreshold to TypeProfileLevel in IRExample Thanks for reviewing it again Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/6037 From chagedorn at openjdk.java.net Mon Oct 25 07:30:10 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 25 Oct 2021 07:30:10 GMT Subject: Integrated: 8275104: IR framework does not handle client VM builds correctly In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 11:19:01 GMT, Christian Hagedorn wrote: > While the IR framework is primarily used for C2 IR verification, it should also work with client VM builds. There are currently some problems which are fixed with this patch: > > - The IR framework currently only bails out of IR matching if C2 is excluded by command line flags. However, when running an IR JTreg test with a client VM build, IR matching fails when not specifically adding `@requires vm.compiler2.enabled` to exclude the test. > - `@Test` and `@ForceCompile` do not work correctly and throw an exception due to an incompatible compilation level selection without C2. > - Some internal framework tests fail (the fix also improves `TestDIgnoreCompilerControls` in general). > > Testing: > > - Standard tier testing > - Testing internal framework tests with standard build (tiered), client VM (without C2) and server VM build (without C1) > > Thanks, > Christian This pull request has now been integrated. Changeset: 1da5e6b0 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/1da5e6b0e2c284c5dd295a0d48cc1c6c2fecf5b5 Stats: 79 lines in 8 files changed: 30 ins; 18 del; 31 mod 8275104: IR framework does not handle client VM builds correctly Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6037 From duke at openjdk.java.net Mon Oct 25 08:16:13 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Mon, 25 Oct 2021 08:16:13 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt Message-ID: `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. image In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. image There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp index 38b40a6..31ff172 100644 --- a/src/hotspot/share/opto/ifnode.cpp +++ b/src/hotspot/share/opto/ifnode.cpp @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { } } + if (is_LongCountedLoopEnd()) { + set_req(0, dom->in(0)); + set_req(1, dom->in(1)); + dom->set_req(0, pre); + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); + Node* proj0 = raw_out(0); + Node* proj1 = raw_out(1); + Node* dom_proj0 = dom->raw_out(0); + Node* dom_proj1 = dom->raw_out(1); + dom_proj0->set_req(0, this); + dom_proj1->set_req(0, this); + proj0->set_req(0, dom); + proj1->set_req(0, dom); + } + if (bol->outcnt() == 0) { igvn->remove_dead_node(bol); // Kill the BoolNode. } diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp index 6f7e34d..7955722 100644 --- a/src/hotspot/share/opto/loopnode.cpp +++ b/src/hotspot/share/opto/loopnode.cpp @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List Node* back_control = head->in(LoopNode::LoopBackControl); // data nodes on back branch not supported - if (back_control->outcnt() > 1) { + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { return false; } ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/6099/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6099&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275854 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6099.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6099/head:pull/6099 PR: https://git.openjdk.java.net/jdk/pull/6099 From roland at openjdk.java.net Mon Oct 25 12:31:05 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 25 Oct 2021 12:31:05 GMT Subject: RFR: 8273585: String.charAt performance degrades due to JDK-8268698 In-Reply-To: References: Message-ID: On Mon, 25 Oct 2021 05:52:02 GMT, Yi Yang wrote: > String.charAt shows significant performance regression due to [JDK-8268698](https://bugs.openjdk.java.net/browse/JDK-8268698), which replaces index bound checking with Preconditions.checkIndex intrinsic method. > > The result of "time linux-x86_64-server-release/images/jdk/bin/java Test": > > > Before JDK-8268698 > real 0m8.369s > user 0m8.386s > sys 0m0.019s > > After JDK-8268698, > real 0m19.722s > user 0m19.748s > sys 0m0.013s > > > The reason is Preconditions.checkIndex generates a CastII for index node as index is now known to be >= 0 and < length.: > > https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/library_call.cpp#L1077-L1083 > > CastII can not be recognized as a parallel induction variable because AddNode's input must be the PhiNode: > > https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/loopnode.cpp#L3177-L3184 > > It seems this prevents further loop unrolling. I think we can relax this constraint, i.e CastII can be the input of AddNode if its input is PhiNode. After applying this patch, performance regression disappears: > > > $time ./test.sh > > real 0m9.514s > user 0m10.310s > sys 0m0.155s > > This is likely the reason for [JDK-8272493](https://bugs.openjdk.java.net/browse/JDK-8272493). Please help review it. Thanks! Changes requested by roland (Reviewer). src/hotspot/share/opto/loopnode.cpp line 3184: > 3182: !incr2->in(2)->is_Con()) > 3183: continue; > 3184: if (incr2->in(1) != phi2) { You could use incr2->in(1)->uncast() != phi2. You don't need to add an extra if then. ------------- PR: https://git.openjdk.java.net/jdk/pull/6096 From roland at openjdk.java.net Mon Oct 25 12:32:01 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 25 Oct 2021 12:32:01 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt In-Reply-To: References: Message-ID: <8trnFPk40QDb9aonjunHfFpUhfqK6Hy8atV5E6rU-ho=.87aa2acd-9b40-49c7-86b0-630bd1523e81@github.com> On Mon, 25 Oct 2021 08:08:48 GMT, ?? wrote: > `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. Is it by mistake (that is it's erroneous to eliminate the LongCountedLoopEndNode)? Do you have a test case for this? Ideally, the PR should include one. ------------- PR: https://git.openjdk.java.net/jdk/pull/6099 From roland at openjdk.java.net Mon Oct 25 12:36:06 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 25 Oct 2021 12:36:06 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v12] In-Reply-To: References: Message-ID: On Fri, 22 Oct 2021 06:41:23 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> build fix > > src/hotspot/share/opto/loopnode.hpp line 1657: > >> 1655: void try_sink_out_of_loop(Node* n); >> 1656: >> 1657: Node* clamp(Node* pNode, Node* pNode1, Node* pNode2); > > Argument naming is not consistent with the implementation. I'll fix the argument names. -min_jint is min_jint. So there's no way to handle a min_jint (or min_jlong) scale above. How else would you handle it? ------------- PR: https://git.openjdk.java.net/jdk/pull/2045 From dnsimon at openjdk.java.net Mon Oct 25 14:42:14 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Mon, 25 Oct 2021 14:42:14 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for all unaligned reads in c2v_readFieldValue Message-ID: [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645), resulted in the loose of single-copy atomicity for reads in c2v_readFieldValue. This PR fixes that by using the `_field_acquire` accessors for all aligned reads in c2v_readFieldValue and only using the `_field` accessors for unaligned reads. ------------- Commit messages: - use _field_acquire for aligned reads in c2v_readFieldValue Changes: https://git.openjdk.java.net/jdk/pull/6109/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6109&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275874 Stats: 30 lines in 1 file changed: 2 ins; 18 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/6109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6109/head:pull/6109 PR: https://git.openjdk.java.net/jdk/pull/6109 From kvn at openjdk.java.net Mon Oct 25 16:13:10 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 25 Oct 2021 16:13:10 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v5] In-Reply-To: References: Message-ID: On Sun, 24 Oct 2021 19:20:42 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes macro assembly routines used by fill stubs of various primitive types for X86 AVX-512 target. >> Following are the main changes:- >> 1) Specialized instruction sequence for fill operation over various block sizes. >> 2) Control flow is sensitive to AVX3Threshold and generated code operates over 32 byte vector (YMM) if >> block size is less than threshold else instructions operate of 64 byte vector (ZMM). >> 3) Bulk fill operation is performed by a destination aligned fill loop with appropriate unroll factor, this >> avoids any cache line split penalty and improves performance. >> 4) Currently fill patterns are vectorized by auto-vectorizer and generated code operates over vectors >> of MaxVectorSize, in addition auto-vectorizer is oblivious to AVX3Thresholds and this may result into >> performance degradation over prior generation of X86 servers where 64 byte vector stores using ZMM >> registers operates at reduced CPU frequency. >> Patch enables JVM runtime flag -XX:+OptimizedFill ON by default for X86 target supporting AVX-512 feature. >> 5) Patch also optimizes the mask generation sequence of fill* macro assembly routines using BZHI instruction. >> >> Performance measurements of an existing JMH micro over Icelake server shows ~1.1-4.0X gains for fill operation with varying block sizes. >> >> Following are detailed results: >> >> System Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S) >> >> Benchmark | Size | Baseline Auto-vectorized -XX:-OptimizeFill (ops/ms) | New Optimized Fill AVX3 Th=4096 (ops/ms) | Gain Factor (OptFill AVX3Th=4096/Baseline) >> -- | -- | -- | -- | -- >> ArraysFill.testByteFill | 10 | 208480.451 | 399980.93 | 1.918553649 >> ArraysFill.testByteFill | 16 | 193927.021 | 381156.448 | 1.965463328 >> ArraysFill.testByteFill | 31 | 99175.805 | 399990.605 | 4.033147046 >> ArraysFill.testByteFill | 59 | 141430.876 | 342233.497 | 2.419793377 >> ArraysFill.testByteFill | 89 | 82091.504 | 342232.822 | 4.168918893 >> ArraysFill.testByteFill | 126 | 72154.769 | 310536.053 | 4.303749528 >> ArraysFill.testByteFill | 250 | 18986.775 | 158263.189 | 8.335443434 >> ArraysFill.testByteFill | 266 | 30057.331 | 166819.658 | 5.550048938 >> ArraysFill.testByteFill | 511 | 30094.92 | 116800.155 | 3.88105883 >> ArraysFill.testByteFill | 1021 | 38467.507 | 89235.56 | 2.319764574 >> ArraysFill.testByteFill | 2047 | 32267.535 | 70625.015 | 2.188732886 >> ArraysFill.testByteFill | 2048 | 25503.489 | 64848.532 | 2.542731781 >> ArraysFill.testByteFill | 4095 | 22432.636 | 42449.149 | 1.892294289 >> ArraysFill.testByteFill | 8195 | 16468.923 | 24810.485 | 1.506503188 >> ArraysFill.testCharFill | 10 | 221038.566 | 400005.661 | 1.809664568 >> ArraysFill.testCharFill | 16 | 209138.43 | 381171.236 | 1.822578643 >> ArraysFill.testCharFill | 31 | 93139.021 | 376441.98 | 4.041721461 >> ArraysFill.testCharFill | 59 | 63575.554 | 310559.54 | 4.884889245 >> ArraysFill.testCharFill | 89 | 61900.064 | 191445.936 | 3.092822909 >> ArraysFill.testCharFill | 126 | 36854.615 | 164187.37 | 4.455001633 >> ArraysFill.testCharFill | 250 | 37991.306 | 138797.511 | 3.653401939 >> ArraysFill.testCharFill | 266 | 44459.522 | 170334.083 | 3.831217146 >> ArraysFill.testCharFill | 511 | 52275.926 | 103012.53 | 1.970553903 >> ArraysFill.testCharFill | 1021 | 51803.73 | 80187.107 | 1.547902188 >> ArraysFill.testCharFill | 2047 | 35820.742 | 38973.828 | 1.088024028 >> ArraysFill.testCharFill | 2048 | 35280.779 | 38209.361 | 1.083007861 >> ArraysFill.testCharFill | 4095 | 21053.869 | 25006.99 | 1.187762211 >> ArraysFill.testCharFill | 8195 | 11419.785 | 12662.777 | 1.108845482 >> ArraysFill.testDoubleFill | 10 | 266086.021 | 220036.789 | 0.826938552 >> ArraysFill.testDoubleFill | 16 | 216597.316 | 218875.135 | 1.010516377 >> ArraysFill.testDoubleFill | 31 | 151868.92 | 174250.587 | 1.147374901 >> ArraysFill.testDoubleFill | 59 | 196480.253 | 194467.527 | 0.98975609 >> ArraysFill.testDoubleFill | 89 | 109787.976 | 102698.432 | 0.935425133 >> ArraysFill.testDoubleFill | 126 | 93945.51 | 121697.956 | 1.295410031 >> ArraysFill.testDoubleFill | 250 | 97830.626 | 81429.644 | 0.832353296 >> ArraysFill.testDoubleFill | 266 | 83560.898 | 91313.593 | 1.092778981 >> ArraysFill.testDoubleFill | 511 | 48710.087 | 48145.392 | 0.988407021 >> ArraysFill.testDoubleFill | 1021 | 25145.002 | 25163.03 | 1.000716962 >> ArraysFill.testDoubleFill | 2047 | 12665.468 | 12639.651 | 0.997961623 >> ArraysFill.testDoubleFill | 2048 | 12202.183 | 12619.316 | 1.034185113 >> ArraysFill.testDoubleFill | 4095 | 6319.101 | 6320.488 | 1.000219493 >> ArraysFill.testDoubleFill | 8195 | 882.585 | 883.727 | 1.001293926 >> ArraysFill.testFloatFill | 10 | 193690.976 | 370572.639 | 1.913215818 >> ArraysFill.testFloatFill | 16 | 178498.07 | 342227.406 | 1.9172611 >> ArraysFill.testFloatFill | 31 | 160406.649 | 323327.925 | 2.015676576 >> ArraysFill.testFloatFill | 59 | 119643.034 | 177091.185 | 1.48016294 >> ArraysFill.testFloatFill | 89 | 64783.111 | 168280.961 | 2.597605431 >> ArraysFill.testFloatFill | 126 | 85291.623 | 152788.86 | 1.791370062 >> ArraysFill.testFloatFill | 250 | 98864.197 | 115429.942 | 1.167560608 >> ArraysFill.testFloatFill | 266 | 104361.908 | 106769.11 | 1.023065906 >> ArraysFill.testFloatFill | 511 | 59063.325 | 73726.544 | 1.248262674 >> ArraysFill.testFloatFill | 1021 | 46426.631 | 44255.239 | 0.953229602 >> ArraysFill.testFloatFill | 2047 | 23853.72 | 24988.53 | 1.047573712 >> ArraysFill.testFloatFill | 2048 | 23774.697 | 24723.921 | 1.039925809 >> ArraysFill.testFloatFill | 4095 | 11879.115 | 12574.113 | 1.058505874 >> ArraysFill.testFloatFill | 8195 | 6288.73 | 6309.257 | 1.003264093 >> ArraysFill.testIntFill | 10 | 202623.377 | 370696.239 | 1.829484063 >> ArraysFill.testIntFill | 16 | 187487.425 | 342203.932 | 1.825210048 >> ArraysFill.testIntFill | 31 | 107876.62 | 323291.016 | 2.996858967 >> ArraysFill.testIntFill | 59 | 76540.074 | 177755.374 | 2.322383096 >> ArraysFill.testIntFill | 89 | 77088.258 | 168496.776 | 2.185764478 >> ArraysFill.testIntFill | 126 | 92532.969 | 150986.404 | 1.631703874 >> ArraysFill.testIntFill | 250 | 99993.079 | 106098.703 | 1.061060466 >> ArraysFill.testIntFill | 266 | 105121.5 | 106809.473 | 1.016057353 >> ArraysFill.testIntFill | 511 | 61711.338 | 84318.27 | 1.366333525 >> ArraysFill.testIntFill | 1021 | 45725.648 | 44835.618 | 0.980535432 >> ArraysFill.testIntFill | 2047 | 24130.633 | 25001.727 | 1.036099094 >> ArraysFill.testIntFill | 2048 | 23873.255 | 24980.662 | 1.04638693 >> ArraysFill.testIntFill | 4095 | 12459.376 | 12666.815 | 1.016649229 >> ArraysFill.testIntFill | 8195 | 6303.873 | 6298.852 | 0.999203506 >> ArraysFill.testLongFill | 10 | 221803.338 | 203110.868 | 0.915725028 >> ArraysFill.testLongFill | 16 | 214013.975 | 230463.726 | 1.076862976 >> ArraysFill.testLongFill | 31 | 153858.758 | 144465.921 | 0.938951561 >> ArraysFill.testLongFill | 59 | 102187.914 | 112064.383 | 1.09665007 >> ArraysFill.testLongFill | 89 | 111940.314 | 107757.211 | 0.962630952 >> ArraysFill.testLongFill | 126 | 137992.49 | 110879.813 | 0.803520634 >> ArraysFill.testLongFill | 250 | 96629.877 | 96195.678 | 0.995506576 >> ArraysFill.testLongFill | 266 | 83984.403 | 86152.382 | 1.025814067 >> ArraysFill.testLongFill | 511 | 48698.933 | 48534.404 | 0.996621507 >> ArraysFill.testLongFill | 1021 | 25178.805 | 25162.502 | 0.999352511 >> ArraysFill.testLongFill | 2047 | 12511.142 | 12652.489 | 1.01129769 >> ArraysFill.testLongFill | 2048 | 12592.614 | 12622.317 | 1.002358764 >> ArraysFill.testLongFill | 4095 | 6377.694 | 6378.312 | 1.0000969 >> ArraysFill.testLongFill | 8195 | 885.065 | 884.811 | 0.999713015 >> ArraysFill.testShortFill | 10 | 196799.048 | 399963.161 | 2.032342966 >> ArraysFill.testShortFill | 16 | 191981.455 | 381173.675 | 1.985471331 >> ArraysFill.testShortFill | 31 | 98647.156 | 370750.549 | 3.758350104 >> ArraysFill.testShortFill | 59 | 79046.737 | 310586.902 | 3.929155254 >> ArraysFill.testShortFill | 89 | 128874.522 | 186302.59 | 1.445612268 >> ArraysFill.testShortFill | 126 | 47243.773 | 177947.204 | 3.766574782 >> ArraysFill.testShortFill | 250 | 37506.377 | 152968.336 | 4.078462071 >> ArraysFill.testShortFill | 266 | 41782.87 | 169466.305 | 4.055879958 >> ArraysFill.testShortFill | 511 | 44061.823 | 109352.795 | 2.481803692 >> ArraysFill.testShortFill | 1021 | 28799.157 | 81115.934 | 2.816607931 >> ArraysFill.testShortFill | 2047 | 38667.85 | 38998.02 | 1.008538618 >> ArraysFill.testShortFill | 2048 | 36626.321 | 38995.272 | 1.064678923 >> ArraysFill.testShortFill | 4095 | 16606.53 | 24724.43 | 1.488837825 >> ArraysFill.testShortFill | 8195 | 11679.891 | 12627.519 | 1.081133291 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - 8275047: Review comments resolution. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8275047 > - 8275047: Review comments resolution. > - 8275047: Aligning the main fill loops and some synthetic changes. > - 8275047: Review comments resolved. > - 8275047: Optimize existing fill stubs for AVX-512 target Looks good. I submitted testing. Will do approval after it finished. ------------- PR: https://git.openjdk.java.net/jdk/pull/5967 From iveresov at openjdk.java.net Mon Oct 25 17:07:10 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Mon, 25 Oct 2021 17:07:10 GMT Subject: Integrated: 8273712: C2: Add mechanism for rejecting inlining of low frequency call sites and deprecate MinInliningThreshold. In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 17:29:41 GMT, Igor Veresov wrote: > Currently the inlining heuristic uses absolute method invocation count to reject methods that are rarely executed (see MinInliningThreshold and its uses). > This presents two problems: > > 1. Method can be rarely used in a particular caller, yet if its total execution count is high it may be still inlined. > 2. The use of absolute counts is inherently problematic with the current compilation policy (adaptive threshold and background compilation). It leads to instabilities of inlining decisions. > > The proposed solution is to consider call site execution ratio in order to reject callees that are rarely executed. Set the old cutoff parameter (MinInliningThreshold) to 0 to essentially disable it and later deprecate it. > > Setting the introduced MinInlineFrequencyRatio = 0.0085 produces the following notable improvements: > Renaissance-Dotty 1.23% > Renaissance-Mnemonics 3.88% > Renaissance-NaiveBayes 9.23% > Renaissance-ScalaKmeans 1.36% > SPECjvm2008-Derby 3.16% > > There are of course some regressions but those are few and on the order of 1.5% > > > This PR will require a CSR before it can be pushed. I'll file a CSR after this is reviewed. This pull request has now been integrated. Changeset: 89671aa1 Author: Igor Veresov URL: https://git.openjdk.java.net/jdk/commit/89671aa164ea500954b0d5caa5ce6190dfbc0d4e Stats: 38 lines in 6 files changed: 16 ins; 4 del; 18 mod 8273712: C2: Add mechanism for rejecting inlining of low frequency call sites and deprecate MinInliningThreshold. Reviewed-by: kvn, rbackman ------------- PR: https://git.openjdk.java.net/jdk/pull/6046 From kvn at openjdk.java.net Mon Oct 25 17:26:11 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 25 Oct 2021 17:26:11 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v5] In-Reply-To: References: Message-ID: <6MhZ5BnurMzjt3-uoCduqdviWVfjqNjGb-v9CdZaDfk=.d76563dc-e083-4742-bc4a-46e636298160@github.com> On Sun, 24 Oct 2021 19:20:42 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes macro assembly routines used by fill stubs of various primitive types for X86 AVX-512 target. >> Following are the main changes:- >> 1) Specialized instruction sequence for fill operation over various block sizes. >> 2) Control flow is sensitive to AVX3Threshold and generated code operates over 32 byte vector (YMM) if >> block size is less than threshold else instructions operate of 64 byte vector (ZMM). >> 3) Bulk fill operation is performed by a destination aligned fill loop with appropriate unroll factor, this >> avoids any cache line split penalty and improves performance. >> 4) Currently fill patterns are vectorized by auto-vectorizer and generated code operates over vectors >> of MaxVectorSize, in addition auto-vectorizer is oblivious to AVX3Thresholds and this may result into >> performance degradation over prior generation of X86 servers where 64 byte vector stores using ZMM >> registers operates at reduced CPU frequency. >> Patch enables JVM runtime flag -XX:+OptimizedFill ON by default for X86 target supporting AVX-512 feature. >> 5) Patch also optimizes the mask generation sequence of fill* macro assembly routines using BZHI instruction. >> >> Performance measurements of an existing JMH micro over Icelake server shows ~1.1-4.0X gains for fill operation with varying block sizes. >> >> Following are detailed results: >> >> System Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S) >> >> Benchmark | Size | Baseline Auto-vectorized -XX:-OptimizeFill (ops/ms) | New Optimized Fill AVX3 Th=4096 (ops/ms) | Gain Factor (OptFill AVX3Th=4096/Baseline) >> -- | -- | -- | -- | -- >> ArraysFill.testByteFill | 10 | 208480.451 | 399980.93 | 1.918553649 >> ArraysFill.testByteFill | 16 | 193927.021 | 381156.448 | 1.965463328 >> ArraysFill.testByteFill | 31 | 99175.805 | 399990.605 | 4.033147046 >> ArraysFill.testByteFill | 59 | 141430.876 | 342233.497 | 2.419793377 >> ArraysFill.testByteFill | 89 | 82091.504 | 342232.822 | 4.168918893 >> ArraysFill.testByteFill | 126 | 72154.769 | 310536.053 | 4.303749528 >> ArraysFill.testByteFill | 250 | 18986.775 | 158263.189 | 8.335443434 >> ArraysFill.testByteFill | 266 | 30057.331 | 166819.658 | 5.550048938 >> ArraysFill.testByteFill | 511 | 30094.92 | 116800.155 | 3.88105883 >> ArraysFill.testByteFill | 1021 | 38467.507 | 89235.56 | 2.319764574 >> ArraysFill.testByteFill | 2047 | 32267.535 | 70625.015 | 2.188732886 >> ArraysFill.testByteFill | 2048 | 25503.489 | 64848.532 | 2.542731781 >> ArraysFill.testByteFill | 4095 | 22432.636 | 42449.149 | 1.892294289 >> ArraysFill.testByteFill | 8195 | 16468.923 | 24810.485 | 1.506503188 >> ArraysFill.testCharFill | 10 | 221038.566 | 400005.661 | 1.809664568 >> ArraysFill.testCharFill | 16 | 209138.43 | 381171.236 | 1.822578643 >> ArraysFill.testCharFill | 31 | 93139.021 | 376441.98 | 4.041721461 >> ArraysFill.testCharFill | 59 | 63575.554 | 310559.54 | 4.884889245 >> ArraysFill.testCharFill | 89 | 61900.064 | 191445.936 | 3.092822909 >> ArraysFill.testCharFill | 126 | 36854.615 | 164187.37 | 4.455001633 >> ArraysFill.testCharFill | 250 | 37991.306 | 138797.511 | 3.653401939 >> ArraysFill.testCharFill | 266 | 44459.522 | 170334.083 | 3.831217146 >> ArraysFill.testCharFill | 511 | 52275.926 | 103012.53 | 1.970553903 >> ArraysFill.testCharFill | 1021 | 51803.73 | 80187.107 | 1.547902188 >> ArraysFill.testCharFill | 2047 | 35820.742 | 38973.828 | 1.088024028 >> ArraysFill.testCharFill | 2048 | 35280.779 | 38209.361 | 1.083007861 >> ArraysFill.testCharFill | 4095 | 21053.869 | 25006.99 | 1.187762211 >> ArraysFill.testCharFill | 8195 | 11419.785 | 12662.777 | 1.108845482 >> ArraysFill.testDoubleFill | 10 | 266086.021 | 220036.789 | 0.826938552 >> ArraysFill.testDoubleFill | 16 | 216597.316 | 218875.135 | 1.010516377 >> ArraysFill.testDoubleFill | 31 | 151868.92 | 174250.587 | 1.147374901 >> ArraysFill.testDoubleFill | 59 | 196480.253 | 194467.527 | 0.98975609 >> ArraysFill.testDoubleFill | 89 | 109787.976 | 102698.432 | 0.935425133 >> ArraysFill.testDoubleFill | 126 | 93945.51 | 121697.956 | 1.295410031 >> ArraysFill.testDoubleFill | 250 | 97830.626 | 81429.644 | 0.832353296 >> ArraysFill.testDoubleFill | 266 | 83560.898 | 91313.593 | 1.092778981 >> ArraysFill.testDoubleFill | 511 | 48710.087 | 48145.392 | 0.988407021 >> ArraysFill.testDoubleFill | 1021 | 25145.002 | 25163.03 | 1.000716962 >> ArraysFill.testDoubleFill | 2047 | 12665.468 | 12639.651 | 0.997961623 >> ArraysFill.testDoubleFill | 2048 | 12202.183 | 12619.316 | 1.034185113 >> ArraysFill.testDoubleFill | 4095 | 6319.101 | 6320.488 | 1.000219493 >> ArraysFill.testDoubleFill | 8195 | 882.585 | 883.727 | 1.001293926 >> ArraysFill.testFloatFill | 10 | 193690.976 | 370572.639 | 1.913215818 >> ArraysFill.testFloatFill | 16 | 178498.07 | 342227.406 | 1.9172611 >> ArraysFill.testFloatFill | 31 | 160406.649 | 323327.925 | 2.015676576 >> ArraysFill.testFloatFill | 59 | 119643.034 | 177091.185 | 1.48016294 >> ArraysFill.testFloatFill | 89 | 64783.111 | 168280.961 | 2.597605431 >> ArraysFill.testFloatFill | 126 | 85291.623 | 152788.86 | 1.791370062 >> ArraysFill.testFloatFill | 250 | 98864.197 | 115429.942 | 1.167560608 >> ArraysFill.testFloatFill | 266 | 104361.908 | 106769.11 | 1.023065906 >> ArraysFill.testFloatFill | 511 | 59063.325 | 73726.544 | 1.248262674 >> ArraysFill.testFloatFill | 1021 | 46426.631 | 44255.239 | 0.953229602 >> ArraysFill.testFloatFill | 2047 | 23853.72 | 24988.53 | 1.047573712 >> ArraysFill.testFloatFill | 2048 | 23774.697 | 24723.921 | 1.039925809 >> ArraysFill.testFloatFill | 4095 | 11879.115 | 12574.113 | 1.058505874 >> ArraysFill.testFloatFill | 8195 | 6288.73 | 6309.257 | 1.003264093 >> ArraysFill.testIntFill | 10 | 202623.377 | 370696.239 | 1.829484063 >> ArraysFill.testIntFill | 16 | 187487.425 | 342203.932 | 1.825210048 >> ArraysFill.testIntFill | 31 | 107876.62 | 323291.016 | 2.996858967 >> ArraysFill.testIntFill | 59 | 76540.074 | 177755.374 | 2.322383096 >> ArraysFill.testIntFill | 89 | 77088.258 | 168496.776 | 2.185764478 >> ArraysFill.testIntFill | 126 | 92532.969 | 150986.404 | 1.631703874 >> ArraysFill.testIntFill | 250 | 99993.079 | 106098.703 | 1.061060466 >> ArraysFill.testIntFill | 266 | 105121.5 | 106809.473 | 1.016057353 >> ArraysFill.testIntFill | 511 | 61711.338 | 84318.27 | 1.366333525 >> ArraysFill.testIntFill | 1021 | 45725.648 | 44835.618 | 0.980535432 >> ArraysFill.testIntFill | 2047 | 24130.633 | 25001.727 | 1.036099094 >> ArraysFill.testIntFill | 2048 | 23873.255 | 24980.662 | 1.04638693 >> ArraysFill.testIntFill | 4095 | 12459.376 | 12666.815 | 1.016649229 >> ArraysFill.testIntFill | 8195 | 6303.873 | 6298.852 | 0.999203506 >> ArraysFill.testLongFill | 10 | 221803.338 | 203110.868 | 0.915725028 >> ArraysFill.testLongFill | 16 | 214013.975 | 230463.726 | 1.076862976 >> ArraysFill.testLongFill | 31 | 153858.758 | 144465.921 | 0.938951561 >> ArraysFill.testLongFill | 59 | 102187.914 | 112064.383 | 1.09665007 >> ArraysFill.testLongFill | 89 | 111940.314 | 107757.211 | 0.962630952 >> ArraysFill.testLongFill | 126 | 137992.49 | 110879.813 | 0.803520634 >> ArraysFill.testLongFill | 250 | 96629.877 | 96195.678 | 0.995506576 >> ArraysFill.testLongFill | 266 | 83984.403 | 86152.382 | 1.025814067 >> ArraysFill.testLongFill | 511 | 48698.933 | 48534.404 | 0.996621507 >> ArraysFill.testLongFill | 1021 | 25178.805 | 25162.502 | 0.999352511 >> ArraysFill.testLongFill | 2047 | 12511.142 | 12652.489 | 1.01129769 >> ArraysFill.testLongFill | 2048 | 12592.614 | 12622.317 | 1.002358764 >> ArraysFill.testLongFill | 4095 | 6377.694 | 6378.312 | 1.0000969 >> ArraysFill.testLongFill | 8195 | 885.065 | 884.811 | 0.999713015 >> ArraysFill.testShortFill | 10 | 196799.048 | 399963.161 | 2.032342966 >> ArraysFill.testShortFill | 16 | 191981.455 | 381173.675 | 1.985471331 >> ArraysFill.testShortFill | 31 | 98647.156 | 370750.549 | 3.758350104 >> ArraysFill.testShortFill | 59 | 79046.737 | 310586.902 | 3.929155254 >> ArraysFill.testShortFill | 89 | 128874.522 | 186302.59 | 1.445612268 >> ArraysFill.testShortFill | 126 | 47243.773 | 177947.204 | 3.766574782 >> ArraysFill.testShortFill | 250 | 37506.377 | 152968.336 | 4.078462071 >> ArraysFill.testShortFill | 266 | 41782.87 | 169466.305 | 4.055879958 >> ArraysFill.testShortFill | 511 | 44061.823 | 109352.795 | 2.481803692 >> ArraysFill.testShortFill | 1021 | 28799.157 | 81115.934 | 2.816607931 >> ArraysFill.testShortFill | 2047 | 38667.85 | 38998.02 | 1.008538618 >> ArraysFill.testShortFill | 2048 | 36626.321 | 38995.272 | 1.064678923 >> ArraysFill.testShortFill | 4095 | 16606.53 | 24724.43 | 1.488837825 >> ArraysFill.testShortFill | 8195 | 11679.891 | 12627.519 | 1.081133291 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - 8275047: Review comments resolution. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8275047 > - 8275047: Review comments resolution. > - 8275047: Aligning the main fill loops and some synthetic changes. > - 8275047: Review comments resolved. > - 8275047: Optimize existing fill stubs for AVX-512 target Testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5967 From never at openjdk.java.net Mon Oct 25 19:09:03 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Mon, 25 Oct 2021 19:09:03 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for all unaligned reads in c2v_readFieldValue In-Reply-To: References: Message-ID: On Mon, 25 Oct 2021 14:33:27 GMT, Doug Simon wrote: > [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645) resulted in loosing single-copy atomicity for reads in `c2v_readFieldValue`. This PR fixes that by using `_field_acquire` accessors for all aligned reads and only using `_field` accessors for unaligned reads. Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6109 From dnsimon at openjdk.java.net Mon Oct 25 19:15:02 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Mon, 25 Oct 2021 19:15:02 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for all unaligned reads in c2v_readFieldValue In-Reply-To: References: Message-ID: On Mon, 25 Oct 2021 14:33:27 GMT, Doug Simon wrote: > [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645) resulted in loosing single-copy atomicity for reads in `c2v_readFieldValue`. This PR fixes that by using `_field_acquire` accessors for all aligned reads and only using `_field` accessors for unaligned reads. @shipilev , it would be great if you could review this. ------------- PR: https://git.openjdk.java.net/jdk/pull/6109 From dholmes at openjdk.java.net Tue Oct 26 00:21:12 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 26 Oct 2021 00:21:12 GMT Subject: RFR: JDK-8275865: Print deoptimization statistics in product builds In-Reply-To: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> References: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> Message-ID: <-40Zj9e4C-J_fFn42AMIT1BsiE8QtLxWKDzUId0dQjY=.523e9d95-5377-43b1-8e8d-b0a086364e48@github.com> On Mon, 25 Oct 2021 11:46:06 GMT, Volker Simonis wrote: > Deoptimization statistics are already gathered in product builds but for some (probably historical) reasons aren't printed to the VM/Compiler log. These statics can be useful when analyzing the reasons for deoptimization and frequent recompilations. > > Because the statistics are already collected anyway, printing them at VM-exit if either `-XX:+LogCompilation` or `-XX:+LogVMOutput` are set won't introduce any runtime overhead. I think the compiler folk need to be the ones to assess this change. ------------- PR: https://git.openjdk.java.net/jdk/pull/6103 From duke at openjdk.java.net Tue Oct 26 01:33:53 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Tue, 26 Oct 2021 01:33:53 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v2] In-Reply-To: References: Message-ID: > `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. > > For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. > image > > > In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. > image > > There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: > > diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp > index 38b40a6..31ff172 100644 > --- a/src/hotspot/share/opto/ifnode.cpp > +++ b/src/hotspot/share/opto/ifnode.cpp > @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { > } > } > > + if (is_LongCountedLoopEnd()) { > + set_req(0, dom->in(0)); > + set_req(1, dom->in(1)); > + dom->set_req(0, pre); > + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); > + Node* proj0 = raw_out(0); > + Node* proj1 = raw_out(1); > + Node* dom_proj0 = dom->raw_out(0); > + Node* dom_proj1 = dom->raw_out(1); > + dom_proj0->set_req(0, this); > + dom_proj1->set_req(0, this); > + proj0->set_req(0, dom); > + proj1->set_req(0, dom); > + } > + > if (bol->outcnt() == 0) { > igvn->remove_dead_node(bol); // Kill the BoolNode. > } > diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp > index 6f7e34d..7955722 100644 > --- a/src/hotspot/share/opto/loopnode.cpp > +++ b/src/hotspot/share/opto/loopnode.cpp > @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List > Node* back_control = head->in(LoopNode::LoopBackControl); > > // data nodes on back branch not supported > - if (back_control->outcnt() > 1) { > + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { > return false; > } ?? has updated the pull request incrementally with one additional commit since the last revision: Add a java fuzz test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6099/files - new: https://git.openjdk.java.net/jdk/pull/6099/files/73313138..a2444447 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6099&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6099&range=00-01 Stats: 232 lines in 1 file changed: 232 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6099.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6099/head:pull/6099 PR: https://git.openjdk.java.net/jdk/pull/6099 From duke at openjdk.java.net Tue Oct 26 01:37:11 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Tue, 26 Oct 2021 01:37:11 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt In-Reply-To: <8trnFPk40QDb9aonjunHfFpUhfqK6Hy8atV5E6rU-ho=.87aa2acd-9b40-49c7-86b0-630bd1523e81@github.com> References: <8trnFPk40QDb9aonjunHfFpUhfqK6Hy8atV5E6rU-ho=.87aa2acd-9b40-49c7-86b0-630bd1523e81@github.com> Message-ID: On Mon, 25 Oct 2021 12:29:30 GMT, Roland Westrelin wrote: > Is it by mistake (that is it's erroneous to eliminate the LongCountedLoopEndNode)? Do you have a test case for this? Ideally, the PR should include one. Thank you for your review. The crash is reported by our fuzz test cluster, and caused by an OSR compilation. I have added the complete fuzz test as the test case. ------------- PR: https://git.openjdk.java.net/jdk/pull/6099 From duke at openjdk.java.net Tue Oct 26 01:42:41 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Tue, 26 Oct 2021 01:42:41 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v3] In-Reply-To: References: Message-ID: > `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. > > For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. > image > > > In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. > image > > There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: > > diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp > index 38b40a6..31ff172 100644 > --- a/src/hotspot/share/opto/ifnode.cpp > +++ b/src/hotspot/share/opto/ifnode.cpp > @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { > } > } > > + if (is_LongCountedLoopEnd()) { > + set_req(0, dom->in(0)); > + set_req(1, dom->in(1)); > + dom->set_req(0, pre); > + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); > + Node* proj0 = raw_out(0); > + Node* proj1 = raw_out(1); > + Node* dom_proj0 = dom->raw_out(0); > + Node* dom_proj1 = dom->raw_out(1); > + dom_proj0->set_req(0, this); > + dom_proj1->set_req(0, this); > + proj0->set_req(0, dom); > + proj1->set_req(0, dom); > + } > + > if (bol->outcnt() == 0) { > igvn->remove_dead_node(bol); // Kill the BoolNode. > } > diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp > index 6f7e34d..7955722 100644 > --- a/src/hotspot/share/opto/loopnode.cpp > +++ b/src/hotspot/share/opto/loopnode.cpp > @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List > Node* back_control = head->in(LoopNode::LoopBackControl); > > // data nodes on back branch not supported > - if (back_control->outcnt() > 1) { > + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { > return false; > } ?? has updated the pull request incrementally with one additional commit since the last revision: Adjust the code style ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6099/files - new: https://git.openjdk.java.net/jdk/pull/6099/files/a2444447..606fb7e2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6099&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6099&range=01-02 Stats: 5 lines in 1 file changed: 4 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6099.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6099/head:pull/6099 PR: https://git.openjdk.java.net/jdk/pull/6099 From yyang at openjdk.java.net Tue Oct 26 02:14:42 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 26 Oct 2021 02:14:42 GMT Subject: RFR: 8273585: String.charAt performance degrades due to JDK-8268698 [v2] In-Reply-To: References: Message-ID: > String.charAt shows significant performance regression due to [JDK-8268698](https://bugs.openjdk.java.net/browse/JDK-8268698), which replaces index bound checking with Preconditions.checkIndex intrinsic method. > > The result of "time linux-x86_64-server-release/images/jdk/bin/java Test": > > > Before JDK-8268698 > real 0m8.369s > user 0m8.386s > sys 0m0.019s > > After JDK-8268698, > real 0m19.722s > user 0m19.748s > sys 0m0.013s > > > The reason is Preconditions.checkIndex generates a CastII for index node as index is now known to be >= 0 and < length.: > > https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/library_call.cpp#L1077-L1083 > > CastII can not be recognized as a parallel induction variable because AddNode's input must be the PhiNode: > > https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/loopnode.cpp#L3177-L3184 > > It seems this prevents further loop unrolling. I think we can relax this constraint, i.e CastII can be the input of AddNode if its input is PhiNode. After applying this patch, performance regression disappears: > > > $time ./test.sh > > real 0m9.514s > user 0m10.310s > sys 0m0.155s > > This is likely the reason for [JDK-8272493](https://bugs.openjdk.java.net/browse/JDK-8272493). Please help review it. Thanks! Yi Yang has updated the pull request incrementally with one additional commit since the last revision: use uncast ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6096/files - new: https://git.openjdk.java.net/jdk/pull/6096/files/8029f673..992906df Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6096&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6096&range=00-01 Stats: 11 lines in 1 file changed: 1 ins; 9 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6096.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6096/head:pull/6096 PR: https://git.openjdk.java.net/jdk/pull/6096 From yyang at openjdk.java.net Tue Oct 26 02:14:47 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 26 Oct 2021 02:14:47 GMT Subject: RFR: 8273585: String.charAt performance degrades due to JDK-8268698 [v2] In-Reply-To: References: Message-ID: On Mon, 25 Oct 2021 12:27:42 GMT, Roland Westrelin wrote: >> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> use uncast > > src/hotspot/share/opto/loopnode.cpp line 3184: > >> 3182: !incr2->in(2)->is_Con()) >> 3183: continue; >> 3184: if (incr2->in(1) != phi2) { > > You could use incr2->in(1)->uncast() != phi2. You don't need to add an extra if then. Since the semantic of uncast is enough clear, I also remove the comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/6096 From njian at openjdk.java.net Tue Oct 26 03:11:17 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Tue, 26 Oct 2021 03:11:17 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v12] In-Reply-To: References: Message-ID: <8_JtcEt9aa_-LV5_os2OPFaa9y9gHenUaZI3_q-541k=.d82aec27-5e0d-4530-97fa-526efd52be33@github.com> On Fri, 22 Oct 2021 09:59:29 GMT, Wang Huang wrote: >> * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. >> * It may be a solver of JDK-8269866, or part of it. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix styles Marked as reviewed by njian (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From wuyan at openjdk.java.net Tue Oct 26 03:52:17 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Tue, 26 Oct 2021 03:52:17 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 03:35:01 GMT, Ningsheng Jian wrote: >> * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. >> * It may be a solver of JDK-8269866, or part of it. > > Thanks for the work! Some general comments: > >> It may be a solver of JDK-8269866, or part of it. > > I would suggest not to have a partial fix of JDK-8269866. I think you can still keep 8259948 as duplicate while targeting this to JDK-8269866 and have a fully proper fix. @theRealELiu may have some thoughts on how to have a clean fix: e.g. there may be some dependency on mid-end part, like JDK-8265244? > > @theRealELiu has marked those missing rules opcode (with specific types/sizes) as unsupported in JDK-8268966, but I don't see you have unmarked them in your patch. So your newly added rules are not able to be tested. And there are some test cases included in JDK-8268966, could you please merge your test case into existing tests, if existing tests cannot cover some cases. > > P.S. could you please fix the jcheck error? Thanks @nsjian @theRealELiu for reviewing this. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From wuyan at openjdk.java.net Tue Oct 26 03:52:17 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Tue, 26 Oct 2021 03:52:17 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v12] In-Reply-To: <_IKgIBUexpX3jnx8NyLS10-x3HKFMX6n2LftCPOeZSQ=.5e684393-6e9a-41ec-ab80-851f302b0904@github.com> References: <_IKgIBUexpX3jnx8NyLS10-x3HKFMX6n2LftCPOeZSQ=.5e684393-6e9a-41ec-ab80-851f302b0904@github.com> Message-ID: On Tue, 10 Aug 2021 09:16:51 GMT, Andrew Haley wrote: >>> Does your testing cover all that is added in this patch? If so, how did you ascertain it? >> >> Yes, These two test cases have covered the code in the patch. First, I unmark the unsupported opcodes in JDK-8268966. Then I test the cases in both of the above test files one by one, implementing the rules once the testcase failed until all the testcases pass. > >> > Does your testing cover all that is added in this patch? If so, how did you ascertain it? >> >> Yes, These two test cases have covered the code in the patch. First, I unmark the unsupported opcodes in JDK-8268966. Then I test the cases in both of the above test files one by one, implementing the rules once the testcase failed until all the testcases pass. > > OK. Hi, @theRealAph do I need another reviewer to review this? ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From jbhateja at openjdk.java.net Tue Oct 26 04:43:11 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 26 Oct 2021 04:43:11 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v4] In-Reply-To: References: <0WViJD0Uip3CjSkJq6T1wxmgGTk4vbo6IngCAyBuc34=.997a2434-e648-40ef-a8f0-fe1669387fb6@github.com> Message-ID: On Thu, 21 Oct 2021 20:23:56 GMT, Vladimir Kozlov wrote: >> This patch optimizes fill stub for AVX-512, to avoid call over head penalty we need to partially inline small fills which can fit in one full/partial vector. Will be sending a separate patch for it. > > @jatin-bhateja You need explicitly state in RFE and this PR that `Long` and `Double` are not covered to avoid confusion. Thanks @vnkozlov, @cl4es can you kindly approve as second reviewer. ------------- PR: https://git.openjdk.java.net/jdk/pull/5967 From dholmes at openjdk.java.net Tue Oct 26 05:23:07 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 26 Oct 2021 05:23:07 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for all unaligned reads in c2v_readFieldValue In-Reply-To: References: Message-ID: On Mon, 25 Oct 2021 14:33:27 GMT, Doug Simon wrote: > [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645) resulted in loosing single-copy atomicity for reads in `c2v_readFieldValue`. This PR fixes that by using `_field_acquire` accessors for all aligned reads and only using `_field` accessors for unaligned reads. Isn't the title of this issue expressed incorrectly? ------------- PR: https://git.openjdk.java.net/jdk/pull/6109 From dnsimon at openjdk.java.net Tue Oct 26 06:59:13 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 26 Oct 2021 06:59:13 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for aligned reads in c2v_readFieldValue In-Reply-To: References: Message-ID: On Tue, 26 Oct 2021 05:20:22 GMT, David Holmes wrote: > Isn't the title of this issue expressed incorrectly? Yes - thanks for pointing that out. ------------- PR: https://git.openjdk.java.net/jdk/pull/6109 From shade at openjdk.java.net Tue Oct 26 08:48:14 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 26 Oct 2021 08:48:14 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for aligned reads in c2v_readFieldValue In-Reply-To: References: Message-ID: On Mon, 25 Oct 2021 14:33:27 GMT, Doug Simon wrote: > [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645) resulted in loosing single-copy atomicity for reads in `c2v_readFieldValue`. This PR fixes that by using `_field_acquire` accessors for all aligned reads and only using `_field` accessors for unaligned reads. It definitely looks better than the current code, but I think my original concern still stands. If there is a caller code that does the non-aligned volatile access, that caller code is in error, and we should not silently downgrade it to non-aligned non-volatile access, like this patch does. If I read your previous comments correctly, you mention [here](https://bugs.openjdk.java.net/browse/JDK-8275645?focusedCommentId=14454416&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14454416) that Graal produces such accesses? If so, then Graal is in error, and this code should fail. Instead of waiting for `SIGBUS`, though, we should probably reinstate the `isVolatile` argument to this method, and add the alignment checks in the [pre-check block](https://github.com/openjdk/jdk/blob/3ff085e2967508ad312c9d32fa908807aefe69ee/src/hotspot/share/jvmci/jvmciCompilerToVM.cpp#L1952-L1953), verifying that volatile accesses should always be aligned. ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6109 From dnsimon at openjdk.java.net Tue Oct 26 09:17:12 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 26 Oct 2021 09:17:12 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for aligned reads in c2v_readFieldValue In-Reply-To: References: Message-ID: On Tue, 26 Oct 2021 08:44:50 GMT, Aleksey Shipilev wrote: > we should not silently downgrade it to non-aligned non-volatile access I'm not so sure. The [javadoc for `Unsafe.getLongUnaligned`](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/misc/Unsafe.java#L3500) includes: *

The read will be atomic with respect to the largest power * of two that divides the GCD of the offset and the storage size. * For example, getLongUnaligned will make atomic reads of 2-, 4-, * or 8-byte storage units if the offset is zero mod 2, 4, or 8, * respectively. There are no other guarantees of atomicity. This implies there's no way someone can ask (in Java) for an unaligned volatile access. How about I clarify the javadoc for the `CompilerToVM.readFieldValue` methods as follows: diff --git a/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/CompilerToVM.java b/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/CompilerToVM.java index 032d21ca235..06c78b37fd8 100644 --- a/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/CompilerToVM.java +++ b/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/CompilerToVM.java @@ -783,15 +783,23 @@ final class CompilerToVM { /** * Reads the current value of a static field. If {@code expectedType} is non-null, then the - * object is exptected to be a subtype of {@code expectedType} and extra sanity checking is + * object is expected to be a subtype of {@code expectedType} and extra sanity checking is * performed on the offset and kind of the read being performed. + * + * The read is performed with volatile load semantics if is aligned (i.e., + * {@code offset % kind.getByteCount()) == 0}). For unaligned reads, non-volatile semantics are + * used. */ native JavaConstant readFieldValue(HotSpotResolvedObjectTypeImpl object, HotSpotResolvedObjectTypeImpl expectedType, long offset, JavaKind kind); /** * Reads the current value of an instance field. If {@code expectedType} is non-null, then the - * object is exptected to be a subtype of {@code expectedType} and extra sanity checking is + * object is expected to be a subtype of {@code expectedType} and extra sanity checking is * performed on the offset and kind of the read being performed. + * + * The read is performed with volatile load semantics if is aligned (i.e., + * {@code offset % kind.getByteCount()) == 0}). For unaligned reads, non-volatile semantics are + * used. */ native JavaConstant readFieldValue(HotSpotObjectConstantImpl object, HotSpotResolvedObjectTypeImpl expectedType, long offset, JavaKind kind); ------------- PR: https://git.openjdk.java.net/jdk/pull/6109 From aph at openjdk.java.net Tue Oct 26 09:28:17 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 26 Oct 2021 09:28:17 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v12] In-Reply-To: References: Message-ID: On Fri, 22 Oct 2021 09:59:29 GMT, Wang Huang wrote: >> * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. >> * It may be a solver of JDK-8269866, or part of it. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix styles Thanks everyone, for all your hard work. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4839 From shade at openjdk.java.net Tue Oct 26 09:38:12 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 26 Oct 2021 09:38:12 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for aligned reads in c2v_readFieldValue In-Reply-To: References: Message-ID: <5HK0nvus_llRu6rSQqTGbzV300rEt0NxrjlsPeJqS5Y=.0f7c59dc-d4a4-4186-a0b7-356dd9b0b8d6@github.com> On Mon, 25 Oct 2021 14:33:27 GMT, Doug Simon wrote: > [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645) resulted in loosing single-copy atomicity for reads in `c2v_readFieldValue`. This PR fixes that by using `_field_acquire` accessors for all aligned reads and only using `_field` accessors for unaligned reads. > > we should not silently downgrade it to non-aligned non-volatile access > > I'm not so sure. The [javadoc for `Unsafe.getLongUnaligned`](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/misc/Unsafe.java#L3500) includes: [...] This implies there's no way someone can ask (in Java) for an unaligned volatile access. No, I don't think it does imply so. Note that `getLongUnaligned` is the addition for `getLong`, not `getLongVolatile`. "volatility" is not something optional, it is explicit, this is why there are `getLong` and `getLongVolatile`. The `getLong`/`getLongUnaligned` access is already non-volatile, and so it can be non-aligned. The job for `getLongUnaligned` is to then figure out if hardware can withstand the coarse unaligned load, or, if not, split it in the individual non-atomic accesses. The `getLongVolatile` is volatile, full stop. If it is called on unaligned offset, receiving `SIGBUS` or other kind of error is the expected behavior. In other words, if caller _asks for volatile access_, it should either get the volatile (atomic) access, or get the error if such access is not possible, or do some sort of atomic recovery (probably involves heavy locking, so this is seldom an option). The access handling code _should not be allowed_ to decay bad volatile loads into "good" non-volatile loads to avoid the reasonable hardware exception. Instead, it should explicitly fail on bad accesses, thus communicating to the caller that the requested "volatility" property cannot be satisfied for non-aligned accesses. ------------- PR: https://git.openjdk.java.net/jdk/pull/6109 From shade at openjdk.java.net Tue Oct 26 09:53:13 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 26 Oct 2021 09:53:13 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for aligned reads in c2v_readFieldValue In-Reply-To: References: Message-ID: On Mon, 25 Oct 2021 14:33:27 GMT, Doug Simon wrote: > [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645) resulted in loosing single-copy atomicity for reads in `c2v_readFieldValue`. This PR fixes that by using `_field_acquire` accessors for all aligned reads and only using `_field` accessors for unaligned reads. Expanding some more on this. I think expressing "volatility" in terms of "alignment" is a dubious API choice. Mostly because it loses the opportunity to check for explicitly-`volatile`-but-accidentally-misaligned accesses. The code that was removed in the original change (https://github.com/openjdk/jdk/commit/4dec8fc4cc2b1762fba554d0401da8be0d6d1166) looked reasonable: volatile fields are accessed with explicit volatility. If it turns out their offsets are unaligned, this code should throw the error. But with this patch, such a malformed access would just be silently downgraded. I am disliking this part. ------------- PR: https://git.openjdk.java.net/jdk/pull/6109 From redestad at openjdk.java.net Tue Oct 26 09:56:17 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 26 Oct 2021 09:56:17 GMT Subject: RFR: 8275047: Optimize existing fill stubs for AVX-512 target [v5] In-Reply-To: References: Message-ID: On Sun, 24 Oct 2021 19:20:42 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes macro assembly routines used by fill stubs of various primitive types for X86 AVX-512 target. >> Following are the main changes:- >> 1) Specialized instruction sequence for fill operation over various block sizes. >> 2) Control flow is sensitive to AVX3Threshold and generated code operates over 32 byte vector (YMM) if >> block size is less than threshold else instructions operate of 64 byte vector (ZMM). >> 3) Bulk fill operation is performed by a destination aligned fill loop with appropriate unroll factor, this >> avoids any cache line split penalty and improves performance. >> 4) Currently fill patterns are vectorized by auto-vectorizer and generated code operates over vectors >> of MaxVectorSize, in addition auto-vectorizer is oblivious to AVX3Thresholds and this may result into >> performance degradation over prior generation of X86 servers where 64 byte vector stores using ZMM >> registers operates at reduced CPU frequency. >> Patch enables JVM runtime flag -XX:+OptimizedFill ON by default for X86 target supporting AVX-512 feature. >> 5) Patch also optimizes the mask generation sequence of fill* macro assembly routines using BZHI instruction. >> >> Performance measurements of an existing JMH micro over Icelake server shows ~1.1-4.0X gains for fill operation with varying block sizes. >> >> Following are detailed results: >> >> System Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S) >> >> Benchmark | Size | Baseline Auto-vectorized -XX:-OptimizeFill (ops/ms) | New Optimized Fill AVX3 Th=4096 (ops/ms) | Gain Factor (OptFill AVX3Th=4096/Baseline) >> -- | -- | -- | -- | -- >> ArraysFill.testByteFill | 10 | 208480.451 | 399980.93 | 1.918553649 >> ArraysFill.testByteFill | 16 | 193927.021 | 381156.448 | 1.965463328 >> ArraysFill.testByteFill | 31 | 99175.805 | 399990.605 | 4.033147046 >> ArraysFill.testByteFill | 59 | 141430.876 | 342233.497 | 2.419793377 >> ArraysFill.testByteFill | 89 | 82091.504 | 342232.822 | 4.168918893 >> ArraysFill.testByteFill | 126 | 72154.769 | 310536.053 | 4.303749528 >> ArraysFill.testByteFill | 250 | 18986.775 | 158263.189 | 8.335443434 >> ArraysFill.testByteFill | 266 | 30057.331 | 166819.658 | 5.550048938 >> ArraysFill.testByteFill | 511 | 30094.92 | 116800.155 | 3.88105883 >> ArraysFill.testByteFill | 1021 | 38467.507 | 89235.56 | 2.319764574 >> ArraysFill.testByteFill | 2047 | 32267.535 | 70625.015 | 2.188732886 >> ArraysFill.testByteFill | 2048 | 25503.489 | 64848.532 | 2.542731781 >> ArraysFill.testByteFill | 4095 | 22432.636 | 42449.149 | 1.892294289 >> ArraysFill.testByteFill | 8195 | 16468.923 | 24810.485 | 1.506503188 >> ArraysFill.testCharFill | 10 | 221038.566 | 400005.661 | 1.809664568 >> ArraysFill.testCharFill | 16 | 209138.43 | 381171.236 | 1.822578643 >> ArraysFill.testCharFill | 31 | 93139.021 | 376441.98 | 4.041721461 >> ArraysFill.testCharFill | 59 | 63575.554 | 310559.54 | 4.884889245 >> ArraysFill.testCharFill | 89 | 61900.064 | 191445.936 | 3.092822909 >> ArraysFill.testCharFill | 126 | 36854.615 | 164187.37 | 4.455001633 >> ArraysFill.testCharFill | 250 | 37991.306 | 138797.511 | 3.653401939 >> ArraysFill.testCharFill | 266 | 44459.522 | 170334.083 | 3.831217146 >> ArraysFill.testCharFill | 511 | 52275.926 | 103012.53 | 1.970553903 >> ArraysFill.testCharFill | 1021 | 51803.73 | 80187.107 | 1.547902188 >> ArraysFill.testCharFill | 2047 | 35820.742 | 38973.828 | 1.088024028 >> ArraysFill.testCharFill | 2048 | 35280.779 | 38209.361 | 1.083007861 >> ArraysFill.testCharFill | 4095 | 21053.869 | 25006.99 | 1.187762211 >> ArraysFill.testCharFill | 8195 | 11419.785 | 12662.777 | 1.108845482 >> ArraysFill.testDoubleFill | 10 | 266086.021 | 220036.789 | 0.826938552 >> ArraysFill.testDoubleFill | 16 | 216597.316 | 218875.135 | 1.010516377 >> ArraysFill.testDoubleFill | 31 | 151868.92 | 174250.587 | 1.147374901 >> ArraysFill.testDoubleFill | 59 | 196480.253 | 194467.527 | 0.98975609 >> ArraysFill.testDoubleFill | 89 | 109787.976 | 102698.432 | 0.935425133 >> ArraysFill.testDoubleFill | 126 | 93945.51 | 121697.956 | 1.295410031 >> ArraysFill.testDoubleFill | 250 | 97830.626 | 81429.644 | 0.832353296 >> ArraysFill.testDoubleFill | 266 | 83560.898 | 91313.593 | 1.092778981 >> ArraysFill.testDoubleFill | 511 | 48710.087 | 48145.392 | 0.988407021 >> ArraysFill.testDoubleFill | 1021 | 25145.002 | 25163.03 | 1.000716962 >> ArraysFill.testDoubleFill | 2047 | 12665.468 | 12639.651 | 0.997961623 >> ArraysFill.testDoubleFill | 2048 | 12202.183 | 12619.316 | 1.034185113 >> ArraysFill.testDoubleFill | 4095 | 6319.101 | 6320.488 | 1.000219493 >> ArraysFill.testDoubleFill | 8195 | 882.585 | 883.727 | 1.001293926 >> ArraysFill.testFloatFill | 10 | 193690.976 | 370572.639 | 1.913215818 >> ArraysFill.testFloatFill | 16 | 178498.07 | 342227.406 | 1.9172611 >> ArraysFill.testFloatFill | 31 | 160406.649 | 323327.925 | 2.015676576 >> ArraysFill.testFloatFill | 59 | 119643.034 | 177091.185 | 1.48016294 >> ArraysFill.testFloatFill | 89 | 64783.111 | 168280.961 | 2.597605431 >> ArraysFill.testFloatFill | 126 | 85291.623 | 152788.86 | 1.791370062 >> ArraysFill.testFloatFill | 250 | 98864.197 | 115429.942 | 1.167560608 >> ArraysFill.testFloatFill | 266 | 104361.908 | 106769.11 | 1.023065906 >> ArraysFill.testFloatFill | 511 | 59063.325 | 73726.544 | 1.248262674 >> ArraysFill.testFloatFill | 1021 | 46426.631 | 44255.239 | 0.953229602 >> ArraysFill.testFloatFill | 2047 | 23853.72 | 24988.53 | 1.047573712 >> ArraysFill.testFloatFill | 2048 | 23774.697 | 24723.921 | 1.039925809 >> ArraysFill.testFloatFill | 4095 | 11879.115 | 12574.113 | 1.058505874 >> ArraysFill.testFloatFill | 8195 | 6288.73 | 6309.257 | 1.003264093 >> ArraysFill.testIntFill | 10 | 202623.377 | 370696.239 | 1.829484063 >> ArraysFill.testIntFill | 16 | 187487.425 | 342203.932 | 1.825210048 >> ArraysFill.testIntFill | 31 | 107876.62 | 323291.016 | 2.996858967 >> ArraysFill.testIntFill | 59 | 76540.074 | 177755.374 | 2.322383096 >> ArraysFill.testIntFill | 89 | 77088.258 | 168496.776 | 2.185764478 >> ArraysFill.testIntFill | 126 | 92532.969 | 150986.404 | 1.631703874 >> ArraysFill.testIntFill | 250 | 99993.079 | 106098.703 | 1.061060466 >> ArraysFill.testIntFill | 266 | 105121.5 | 106809.473 | 1.016057353 >> ArraysFill.testIntFill | 511 | 61711.338 | 84318.27 | 1.366333525 >> ArraysFill.testIntFill | 1021 | 45725.648 | 44835.618 | 0.980535432 >> ArraysFill.testIntFill | 2047 | 24130.633 | 25001.727 | 1.036099094 >> ArraysFill.testIntFill | 2048 | 23873.255 | 24980.662 | 1.04638693 >> ArraysFill.testIntFill | 4095 | 12459.376 | 12666.815 | 1.016649229 >> ArraysFill.testIntFill | 8195 | 6303.873 | 6298.852 | 0.999203506 >> ArraysFill.testLongFill | 10 | 221803.338 | 203110.868 | 0.915725028 >> ArraysFill.testLongFill | 16 | 214013.975 | 230463.726 | 1.076862976 >> ArraysFill.testLongFill | 31 | 153858.758 | 144465.921 | 0.938951561 >> ArraysFill.testLongFill | 59 | 102187.914 | 112064.383 | 1.09665007 >> ArraysFill.testLongFill | 89 | 111940.314 | 107757.211 | 0.962630952 >> ArraysFill.testLongFill | 126 | 137992.49 | 110879.813 | 0.803520634 >> ArraysFill.testLongFill | 250 | 96629.877 | 96195.678 | 0.995506576 >> ArraysFill.testLongFill | 266 | 83984.403 | 86152.382 | 1.025814067 >> ArraysFill.testLongFill | 511 | 48698.933 | 48534.404 | 0.996621507 >> ArraysFill.testLongFill | 1021 | 25178.805 | 25162.502 | 0.999352511 >> ArraysFill.testLongFill | 2047 | 12511.142 | 12652.489 | 1.01129769 >> ArraysFill.testLongFill | 2048 | 12592.614 | 12622.317 | 1.002358764 >> ArraysFill.testLongFill | 4095 | 6377.694 | 6378.312 | 1.0000969 >> ArraysFill.testLongFill | 8195 | 885.065 | 884.811 | 0.999713015 >> ArraysFill.testShortFill | 10 | 196799.048 | 399963.161 | 2.032342966 >> ArraysFill.testShortFill | 16 | 191981.455 | 381173.675 | 1.985471331 >> ArraysFill.testShortFill | 31 | 98647.156 | 370750.549 | 3.758350104 >> ArraysFill.testShortFill | 59 | 79046.737 | 310586.902 | 3.929155254 >> ArraysFill.testShortFill | 89 | 128874.522 | 186302.59 | 1.445612268 >> ArraysFill.testShortFill | 126 | 47243.773 | 177947.204 | 3.766574782 >> ArraysFill.testShortFill | 250 | 37506.377 | 152968.336 | 4.078462071 >> ArraysFill.testShortFill | 266 | 41782.87 | 169466.305 | 4.055879958 >> ArraysFill.testShortFill | 511 | 44061.823 | 109352.795 | 2.481803692 >> ArraysFill.testShortFill | 1021 | 28799.157 | 81115.934 | 2.816607931 >> ArraysFill.testShortFill | 2047 | 38667.85 | 38998.02 | 1.008538618 >> ArraysFill.testShortFill | 2048 | 36626.321 | 38995.272 | 1.064678923 >> ArraysFill.testShortFill | 4095 | 16606.53 | 24724.43 | 1.488837825 >> ArraysFill.testShortFill | 8195 | 11679.891 | 12627.519 | 1.081133291 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - 8275047: Review comments resolution. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8275047 > - 8275047: Review comments resolution. > - 8275047: Aligning the main fill loops and some synthetic changes. > - 8275047: Review comments resolved. > - 8275047: Optimize existing fill stubs for AVX-512 target FWIW this looks good to me. ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5967 From shade at openjdk.java.net Tue Oct 26 10:05:09 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 26 Oct 2021 10:05:09 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for aligned reads in c2v_readFieldValue In-Reply-To: References: Message-ID: <4NjWJ0SBtEWDsEF_GcXam3vzjVWAC4MOtI-sg7fEUGM=.491b6119-2f84-40d7-b8e9-97f7127780d3@github.com> On Mon, 25 Oct 2021 14:33:27 GMT, Doug Simon wrote: > [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645) resulted in loosing single-copy atomicity for reads in `c2v_readFieldValue`. This PR fixes that by using `_field_acquire` accessors for all aligned reads and only using `_field` accessors for unaligned reads. As I read the original change (https://github.com/openjdk/jdk/commit/4dec8fc4cc2b1762fba554d0401da8be0d6d1166) more, I am puzzled some more. Apart from fields that carry their `isVolatile` properties, some other things, like constants, were accessed as volatiles unconditionally. Assuming the volatility is indeed needed there, then what this patch does is breaking that property for constants that reside at unfortunate (unaligned) offsets, right? That does not seem correct. ------------- PR: https://git.openjdk.java.net/jdk/pull/6109 From adinn at openjdk.java.net Tue Oct 26 10:52:09 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Tue, 26 Oct 2021 10:52:09 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for aligned reads in c2v_readFieldValue In-Reply-To: References: Message-ID: On Mon, 25 Oct 2021 14:33:27 GMT, Doug Simon wrote: > [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645) resulted in loosing single-copy atomicity for reads in `c2v_readFieldValue`. This PR fixes that by using `_field_acquire` accessors for all aligned reads and only using `_field` accessors for unaligned reads. Indeed, the problem here is not that the hardware is disallowing a volatile read at an unaligned address but that the compiler is being *asked* to generate a volatile read at an unaligned address. That is a symptom of the calling code either using an invalid layout, or otherwise computing an invalid offset. It's no good trying to fix that symptom because, as Aleksey says, there is no fix that will preserve both volatility and single-copy atomicity. The fix needs to be made to to the root problem i.e. correct whatever code determined that a volatile field could legitimately reside at an unaligned address. ------------- PR: https://git.openjdk.java.net/jdk/pull/6109 From roland at openjdk.java.net Tue Oct 26 11:47:17 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 26 Oct 2021 11:47:17 GMT Subject: RFR: 8273585: String.charAt performance degrades due to JDK-8268698 [v2] In-Reply-To: References: Message-ID: On Tue, 26 Oct 2021 02:14:42 GMT, Yi Yang wrote: >> String.charAt shows significant performance regression due to [JDK-8268698](https://bugs.openjdk.java.net/browse/JDK-8268698), which replaces index bound checking with Preconditions.checkIndex intrinsic method. >> >> The result of "time linux-x86_64-server-release/images/jdk/bin/java Test": >> >> >> Before JDK-8268698 >> real 0m8.369s >> user 0m8.386s >> sys 0m0.019s >> >> After JDK-8268698, >> real 0m19.722s >> user 0m19.748s >> sys 0m0.013s >> >> >> The reason is Preconditions.checkIndex generates a CastII for index node as index is now known to be >= 0 and < length.: >> >> https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/library_call.cpp#L1077-L1083 >> >> CastII can not be recognized as a parallel induction variable because AddNode's input must be the PhiNode: >> >> https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/loopnode.cpp#L3177-L3184 >> >> It seems this prevents further loop unrolling. I think we can relax this constraint, i.e CastII can be the input of AddNode if its input is PhiNode. After applying this patch, performance regression disappears: >> >> >> $time ./test.sh >> >> real 0m9.514s >> user 0m10.310s >> sys 0m0.155s >> >> This is likely the reason for [JDK-8272493](https://bugs.openjdk.java.net/browse/JDK-8272493). Please help review it. Thanks! > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > use uncast Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6096 From jbhateja at openjdk.java.net Tue Oct 26 12:38:15 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 26 Oct 2021 12:38:15 GMT Subject: Integrated: 8275047: Optimize existing fill stubs for AVX-512 target In-Reply-To: References: Message-ID: On Fri, 15 Oct 2021 12:43:33 GMT, Jatin Bhateja wrote: > Hi All, > > This patch optimizes macro assembly routines used by fill stubs of various primitive types for X86 AVX-512 target. > Following are the main changes:- > 1) Specialized instruction sequence for fill operation over various block sizes. > 2) Control flow is sensitive to AVX3Threshold and generated code operates over 32 byte vector (YMM) if > block size is less than threshold else instructions operate of 64 byte vector (ZMM). > 3) Bulk fill operation is performed by a destination aligned fill loop with appropriate unroll factor, this > avoids any cache line split penalty and improves performance. > 4) Currently fill patterns are vectorized by auto-vectorizer and generated code operates over vectors > of MaxVectorSize, in addition auto-vectorizer is oblivious to AVX3Thresholds and this may result into > performance degradation over prior generation of X86 servers where 64 byte vector stores using ZMM > registers operates at reduced CPU frequency. > Patch enables JVM runtime flag -XX:+OptimizedFill ON by default for X86 target supporting AVX-512 feature. > 5) Patch also optimizes the mask generation sequence of fill* macro assembly routines using BZHI instruction. > > Performance measurements of an existing JMH micro over Icelake server shows ~1.1-4.0X gains for fill operation with varying block sizes. > > Following are detailed results: > > System Configuration: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S) > > Benchmark | Size | Baseline Auto-vectorized -XX:-OptimizeFill (ops/ms) | New Optimized Fill AVX3 Th=4096 (ops/ms) | Gain Factor (OptFill AVX3Th=4096/Baseline) > -- | -- | -- | -- | -- > ArraysFill.testByteFill | 10 | 208480.451 | 399980.93 | 1.918553649 > ArraysFill.testByteFill | 16 | 193927.021 | 381156.448 | 1.965463328 > ArraysFill.testByteFill | 31 | 99175.805 | 399990.605 | 4.033147046 > ArraysFill.testByteFill | 59 | 141430.876 | 342233.497 | 2.419793377 > ArraysFill.testByteFill | 89 | 82091.504 | 342232.822 | 4.168918893 > ArraysFill.testByteFill | 126 | 72154.769 | 310536.053 | 4.303749528 > ArraysFill.testByteFill | 250 | 18986.775 | 158263.189 | 8.335443434 > ArraysFill.testByteFill | 266 | 30057.331 | 166819.658 | 5.550048938 > ArraysFill.testByteFill | 511 | 30094.92 | 116800.155 | 3.88105883 > ArraysFill.testByteFill | 1021 | 38467.507 | 89235.56 | 2.319764574 > ArraysFill.testByteFill | 2047 | 32267.535 | 70625.015 | 2.188732886 > ArraysFill.testByteFill | 2048 | 25503.489 | 64848.532 | 2.542731781 > ArraysFill.testByteFill | 4095 | 22432.636 | 42449.149 | 1.892294289 > ArraysFill.testByteFill | 8195 | 16468.923 | 24810.485 | 1.506503188 > ArraysFill.testCharFill | 10 | 221038.566 | 400005.661 | 1.809664568 > ArraysFill.testCharFill | 16 | 209138.43 | 381171.236 | 1.822578643 > ArraysFill.testCharFill | 31 | 93139.021 | 376441.98 | 4.041721461 > ArraysFill.testCharFill | 59 | 63575.554 | 310559.54 | 4.884889245 > ArraysFill.testCharFill | 89 | 61900.064 | 191445.936 | 3.092822909 > ArraysFill.testCharFill | 126 | 36854.615 | 164187.37 | 4.455001633 > ArraysFill.testCharFill | 250 | 37991.306 | 138797.511 | 3.653401939 > ArraysFill.testCharFill | 266 | 44459.522 | 170334.083 | 3.831217146 > ArraysFill.testCharFill | 511 | 52275.926 | 103012.53 | 1.970553903 > ArraysFill.testCharFill | 1021 | 51803.73 | 80187.107 | 1.547902188 > ArraysFill.testCharFill | 2047 | 35820.742 | 38973.828 | 1.088024028 > ArraysFill.testCharFill | 2048 | 35280.779 | 38209.361 | 1.083007861 > ArraysFill.testCharFill | 4095 | 21053.869 | 25006.99 | 1.187762211 > ArraysFill.testCharFill | 8195 | 11419.785 | 12662.777 | 1.108845482 > ArraysFill.testDoubleFill | 10 | 266086.021 | 220036.789 | 0.826938552 > ArraysFill.testDoubleFill | 16 | 216597.316 | 218875.135 | 1.010516377 > ArraysFill.testDoubleFill | 31 | 151868.92 | 174250.587 | 1.147374901 > ArraysFill.testDoubleFill | 59 | 196480.253 | 194467.527 | 0.98975609 > ArraysFill.testDoubleFill | 89 | 109787.976 | 102698.432 | 0.935425133 > ArraysFill.testDoubleFill | 126 | 93945.51 | 121697.956 | 1.295410031 > ArraysFill.testDoubleFill | 250 | 97830.626 | 81429.644 | 0.832353296 > ArraysFill.testDoubleFill | 266 | 83560.898 | 91313.593 | 1.092778981 > ArraysFill.testDoubleFill | 511 | 48710.087 | 48145.392 | 0.988407021 > ArraysFill.testDoubleFill | 1021 | 25145.002 | 25163.03 | 1.000716962 > ArraysFill.testDoubleFill | 2047 | 12665.468 | 12639.651 | 0.997961623 > ArraysFill.testDoubleFill | 2048 | 12202.183 | 12619.316 | 1.034185113 > ArraysFill.testDoubleFill | 4095 | 6319.101 | 6320.488 | 1.000219493 > ArraysFill.testDoubleFill | 8195 | 882.585 | 883.727 | 1.001293926 > ArraysFill.testFloatFill | 10 | 193690.976 | 370572.639 | 1.913215818 > ArraysFill.testFloatFill | 16 | 178498.07 | 342227.406 | 1.9172611 > ArraysFill.testFloatFill | 31 | 160406.649 | 323327.925 | 2.015676576 > ArraysFill.testFloatFill | 59 | 119643.034 | 177091.185 | 1.48016294 > ArraysFill.testFloatFill | 89 | 64783.111 | 168280.961 | 2.597605431 > ArraysFill.testFloatFill | 126 | 85291.623 | 152788.86 | 1.791370062 > ArraysFill.testFloatFill | 250 | 98864.197 | 115429.942 | 1.167560608 > ArraysFill.testFloatFill | 266 | 104361.908 | 106769.11 | 1.023065906 > ArraysFill.testFloatFill | 511 | 59063.325 | 73726.544 | 1.248262674 > ArraysFill.testFloatFill | 1021 | 46426.631 | 44255.239 | 0.953229602 > ArraysFill.testFloatFill | 2047 | 23853.72 | 24988.53 | 1.047573712 > ArraysFill.testFloatFill | 2048 | 23774.697 | 24723.921 | 1.039925809 > ArraysFill.testFloatFill | 4095 | 11879.115 | 12574.113 | 1.058505874 > ArraysFill.testFloatFill | 8195 | 6288.73 | 6309.257 | 1.003264093 > ArraysFill.testIntFill | 10 | 202623.377 | 370696.239 | 1.829484063 > ArraysFill.testIntFill | 16 | 187487.425 | 342203.932 | 1.825210048 > ArraysFill.testIntFill | 31 | 107876.62 | 323291.016 | 2.996858967 > ArraysFill.testIntFill | 59 | 76540.074 | 177755.374 | 2.322383096 > ArraysFill.testIntFill | 89 | 77088.258 | 168496.776 | 2.185764478 > ArraysFill.testIntFill | 126 | 92532.969 | 150986.404 | 1.631703874 > ArraysFill.testIntFill | 250 | 99993.079 | 106098.703 | 1.061060466 > ArraysFill.testIntFill | 266 | 105121.5 | 106809.473 | 1.016057353 > ArraysFill.testIntFill | 511 | 61711.338 | 84318.27 | 1.366333525 > ArraysFill.testIntFill | 1021 | 45725.648 | 44835.618 | 0.980535432 > ArraysFill.testIntFill | 2047 | 24130.633 | 25001.727 | 1.036099094 > ArraysFill.testIntFill | 2048 | 23873.255 | 24980.662 | 1.04638693 > ArraysFill.testIntFill | 4095 | 12459.376 | 12666.815 | 1.016649229 > ArraysFill.testIntFill | 8195 | 6303.873 | 6298.852 | 0.999203506 > ArraysFill.testLongFill | 10 | 221803.338 | 203110.868 | 0.915725028 > ArraysFill.testLongFill | 16 | 214013.975 | 230463.726 | 1.076862976 > ArraysFill.testLongFill | 31 | 153858.758 | 144465.921 | 0.938951561 > ArraysFill.testLongFill | 59 | 102187.914 | 112064.383 | 1.09665007 > ArraysFill.testLongFill | 89 | 111940.314 | 107757.211 | 0.962630952 > ArraysFill.testLongFill | 126 | 137992.49 | 110879.813 | 0.803520634 > ArraysFill.testLongFill | 250 | 96629.877 | 96195.678 | 0.995506576 > ArraysFill.testLongFill | 266 | 83984.403 | 86152.382 | 1.025814067 > ArraysFill.testLongFill | 511 | 48698.933 | 48534.404 | 0.996621507 > ArraysFill.testLongFill | 1021 | 25178.805 | 25162.502 | 0.999352511 > ArraysFill.testLongFill | 2047 | 12511.142 | 12652.489 | 1.01129769 > ArraysFill.testLongFill | 2048 | 12592.614 | 12622.317 | 1.002358764 > ArraysFill.testLongFill | 4095 | 6377.694 | 6378.312 | 1.0000969 > ArraysFill.testLongFill | 8195 | 885.065 | 884.811 | 0.999713015 > ArraysFill.testShortFill | 10 | 196799.048 | 399963.161 | 2.032342966 > ArraysFill.testShortFill | 16 | 191981.455 | 381173.675 | 1.985471331 > ArraysFill.testShortFill | 31 | 98647.156 | 370750.549 | 3.758350104 > ArraysFill.testShortFill | 59 | 79046.737 | 310586.902 | 3.929155254 > ArraysFill.testShortFill | 89 | 128874.522 | 186302.59 | 1.445612268 > ArraysFill.testShortFill | 126 | 47243.773 | 177947.204 | 3.766574782 > ArraysFill.testShortFill | 250 | 37506.377 | 152968.336 | 4.078462071 > ArraysFill.testShortFill | 266 | 41782.87 | 169466.305 | 4.055879958 > ArraysFill.testShortFill | 511 | 44061.823 | 109352.795 | 2.481803692 > ArraysFill.testShortFill | 1021 | 28799.157 | 81115.934 | 2.816607931 > ArraysFill.testShortFill | 2047 | 38667.85 | 38998.02 | 1.008538618 > ArraysFill.testShortFill | 2048 | 36626.321 | 38995.272 | 1.064678923 > ArraysFill.testShortFill | 4095 | 16606.53 | 24724.43 | 1.488837825 > ArraysFill.testShortFill | 8195 | 11679.891 | 12627.519 | 1.081133291 > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 4be88d54 Author: Jatin Bhateja URL: https://git.openjdk.java.net/jdk/commit/4be88d5482d45e22eb756a6e2ad19ebd7110639a Stats: 278 lines in 6 files changed: 235 ins; 16 del; 27 mod 8275047: Optimize existing fill stubs for AVX-512 target Reviewed-by: kvn, redestad ------------- PR: https://git.openjdk.java.net/jdk/pull/5967 From roland at openjdk.java.net Tue Oct 26 13:26:36 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 26 Oct 2021 13:26:36 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v13] In-Reply-To: References: Message-ID: > JDK-8255150 makes it possible for java code to explicitly perform a > range check on long values. JDK-8223051 provides a transformation of > long counted loops into loop nests with an inner int counted > loop. With this change I propose transforming long range checks that > operate on the iv of a long counted loop into range checks that > operate on the iv of the int inner loop once it has been > created. Existing range check eliminations can then kick in. > > Transformation of range checks is piggy backed on the loop nest > creation for 2 reasons: > > - pattern matching range checks is easier right before the loop nest > is created > > - the number of iterations of the inner loop is adjusted so scale * > inner_iv doesn't overflow > > C2 has logic to delay some split if transformations so they don't > break the scale * iv + offset pattern. I reused that logic for long > range checks and had to relax what's considered a range check because > initially a range check from Object.checkIndex() may include a test > for range > 0 that needs a round of loop opts to be hoisted. I realize > there's some code duplication but I didn't see a way to share logic > between IdealLoopTree::may_have_range_check() > IdealLoopTree::policy_range_check() that would feel right. > > I realize the comment in PhaseIdealLoop::transform_long_range_checks() > is scary. FWIW, it's not as complicated as it looks. I found drawing > the range covered by the entire long loop and the range covered by the > inner loop help see how range checks can be transformed. Then the > comment helps make sure all cases are covered and verify the generated > code actually covers all of them. > > One issue is overflow. I think the fact that inner_iv * scale doesn't > overflow helps simplify thing. One possible overflow is that of scale > * upper + offset which is handled by forcing all range checks in that > case to deoptimize. I don't think other case of overflow needs special > handling. > > This was tested with a Memory Segment micro benchmark (and patched > Memory Segment support to take advantage of the new checkIndex > intrinsic, both provided by Maurizio). Range checks in the micro > benchmark are properly optimized (and performance increases > significantly). Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2045/files - new: https://git.openjdk.java.net/jdk/pull/2045/files/0409fb3e..72a1d7e8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2045&range=12 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2045&range=11-12 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2045.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2045/head:pull/2045 PR: https://git.openjdk.java.net/jdk/pull/2045 From roland at openjdk.java.net Tue Oct 26 13:33:19 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 26 Oct 2021 13:33:19 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v12] In-Reply-To: References: Message-ID: On Mon, 25 Oct 2021 12:32:44 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.hpp line 1657: >> >>> 1655: void try_sink_out_of_loop(Node* n); >>> 1656: >>> 1657: Node* clamp(Node* pNode, Node* pNode1, Node* pNode2); >> >> Argument naming is not consistent with the implementation. > > I'll fix the argument names. > -min_jint is min_jint. So there's no way to handle a min_jint (or min_jlong) scale above. How else would you handle it? I pushed a commit that addresses both comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/2045 From thartmann at openjdk.java.net Tue Oct 26 13:46:12 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 26 Oct 2021 13:46:12 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v13] In-Reply-To: References: Message-ID: On Tue, 26 Oct 2021 13:26:36 GMT, Roland Westrelin wrote: >> JDK-8255150 makes it possible for java code to explicitly perform a >> range check on long values. JDK-8223051 provides a transformation of >> long counted loops into loop nests with an inner int counted >> loop. With this change I propose transforming long range checks that >> operate on the iv of a long counted loop into range checks that >> operate on the iv of the int inner loop once it has been >> created. Existing range check eliminations can then kick in. >> >> Transformation of range checks is piggy backed on the loop nest >> creation for 2 reasons: >> >> - pattern matching range checks is easier right before the loop nest >> is created >> >> - the number of iterations of the inner loop is adjusted so scale * >> inner_iv doesn't overflow >> >> C2 has logic to delay some split if transformations so they don't >> break the scale * iv + offset pattern. I reused that logic for long >> range checks and had to relax what's considered a range check because >> initially a range check from Object.checkIndex() may include a test >> for range > 0 that needs a round of loop opts to be hoisted. I realize >> there's some code duplication but I didn't see a way to share logic >> between IdealLoopTree::may_have_range_check() >> IdealLoopTree::policy_range_check() that would feel right. >> >> I realize the comment in PhaseIdealLoop::transform_long_range_checks() >> is scary. FWIW, it's not as complicated as it looks. I found drawing >> the range covered by the entire long loop and the range covered by the >> inner loop help see how range checks can be transformed. Then the >> comment helps make sure all cases are covered and verify the generated >> code actually covers all of them. >> >> One issue is overflow. I think the fact that inner_iv * scale doesn't >> overflow helps simplify thing. One possible overflow is that of scale >> * upper + offset which is handled by forcing all range checks in that >> case to deoptimize. I don't think other case of overflow needs special >> handling. >> >> This was tested with a Memory Segment micro benchmark (and patched >> Memory Segment support to take advantage of the new checkIndex >> intrinsic, both provided by Maurizio). Range checks in the micro >> benchmark are properly optimized (and performance increases >> significantly). > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks, that looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2045 From dnsimon at openjdk.java.net Tue Oct 26 13:55:43 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 26 Oct 2021 13:55:43 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for aligned reads in c2v_readFieldValue [v2] In-Reply-To: References: Message-ID: > [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645) resulted in loosing single-copy atomicity for reads in `c2v_readFieldValue`. This PR fixes that by using `_field_acquire` accessors for all aligned reads and only using `_field` accessors for unaligned reads. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: restrict c2v_readFieldValue to only perform aligned reads ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6109/files - new: https://git.openjdk.java.net/jdk/pull/6109/files/5ea7fedb..4ca42e72 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6109&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6109&range=00-01 Stats: 33 lines in 5 files changed: 20 ins; 2 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/6109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6109/head:pull/6109 PR: https://git.openjdk.java.net/jdk/pull/6109 From dnsimon at openjdk.java.net Tue Oct 26 13:58:11 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 26 Oct 2021 13:58:11 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for aligned reads in c2v_readFieldValue In-Reply-To: <4NjWJ0SBtEWDsEF_GcXam3vzjVWAC4MOtI-sg7fEUGM=.491b6119-2f84-40d7-b8e9-97f7127780d3@github.com> References: <4NjWJ0SBtEWDsEF_GcXam3vzjVWAC4MOtI-sg7fEUGM=.491b6119-2f84-40d7-b8e9-97f7127780d3@github.com> Message-ID: On Tue, 26 Oct 2021 10:01:48 GMT, Aleksey Shipilev wrote: >> [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645) resulted in loosing single-copy atomicity for reads in `c2v_readFieldValue`. This PR fixes that by using `_field_acquire` accessors for all aligned reads and only using `_field` accessors for unaligned reads. > > As I read the original change (https://github.com/openjdk/jdk/commit/4dec8fc4cc2b1762fba554d0401da8be0d6d1166) more, I am puzzled some more. Apart from fields that carry their `isVolatile` properties, some other things, like constants, were accessed as volatiles unconditionally. Assuming the volatility is indeed needed there, then what this patch does is breaking that property for constants that reside at unfortunate (unaligned) offsets, right? That does not seem correct. Thanks for the clarifications @shipilev and @adinn . I think the best thing to do is to constrain `CompilerToVM.readFieldValue` to only support aligned reads, the only use case that really matters. I've just pushed a change that implements and tests this. Graal already handles attempts to read that violate the sanity checks done by JVMCI. ------------- PR: https://git.openjdk.java.net/jdk/pull/6109 From shade at openjdk.java.net Tue Oct 26 14:20:11 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 26 Oct 2021 14:20:11 GMT Subject: RFR: 8275874: [JVMCI] use volatile accessors for aligned reads in c2v_readFieldValue [v2] In-Reply-To: References: Message-ID: On Tue, 26 Oct 2021 13:55:43 GMT, Doug Simon wrote: >> [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645) resulted in loosing single-copy atomicity for reads in `c2v_readFieldValue`. This PR fixes that by using `_field_acquire` accessors for all aligned reads and only using `_field` accessors for unaligned reads. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > restrict c2v_readFieldValue to only perform aligned reads Thank you, this looks much safer to me. A few minor nits below. Also, synopsis had once again diverged from the direction this PR is going. src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.meta/src/jdk/vm/ci/meta/MemoryAccessProvider.java line 40: > 38: * @throws IllegalArgumentException if the read is out of bounds of the object or {@code kind} > 39: * is {@link JavaKind#Void} or not {@linkplain JavaKind#isPrimitive() primitive} > 40: * kind or {@code bits} is not 8, 16, 32 or 64 or the read is unaligned Suggestion: * kind or {@code bits} is not 8, 16, 32 or 64, or the read is unaligned (not sure about this, but feels better with additional comma) test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/src/jdk/vm/ci/hotspot/test/MemoryAccessProviderData.java line 84: > 82: } > 83: @DataProvider(name = "unalignedPrimitive") > 84: public static Object[][] getUnalingedPrimitiveJavaKinds() { Suggestion: public static Object[][] getUnalignedPrimitiveJavaKinds() { ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6109 From dnsimon at openjdk.java.net Tue Oct 26 14:41:43 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 26 Oct 2021 14:41:43 GMT Subject: RFR: 8275874: [JVMCI] only support aligned reads in c2v_readFieldValue [v3] In-Reply-To: References: Message-ID: <_tTTNDHfyjurT_hjFEYvnOKHOtLTeEtlCEl6YWyhR4g=.2abb849e-6143-4466-82cf-e2410c3c7c1f@github.com> > [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645) resulted in loosing single-copy atomicity for reads in `c2v_readFieldValue`. This PR fixes that only doing aligned reads in `c2v_readFieldValue`. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: fixed spelling and grammar ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6109/files - new: https://git.openjdk.java.net/jdk/pull/6109/files/4ca42e72..722e2d20 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6109&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6109&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6109/head:pull/6109 PR: https://git.openjdk.java.net/jdk/pull/6109 From dnsimon at openjdk.java.net Tue Oct 26 14:41:45 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 26 Oct 2021 14:41:45 GMT Subject: RFR: 8275874: [JVMCI] only support aligned reads in c2v_readFieldValue [v2] In-Reply-To: References: Message-ID: <6ur7aOegWZ9njVNLY8mRBi7OdascB8aQsWxiDeaR4RQ=.a1471850-d9bd-4885-b47a-3fae0d5ca1ef@github.com> On Tue, 26 Oct 2021 14:16:54 GMT, Aleksey Shipilev wrote: > Also, synopsis had once again diverged from the direction this PR is going. I've updated the bug and PR description to match the current (final?) direction. I've also fixed the nits - thanks for your detailed eye! ------------- PR: https://git.openjdk.java.net/jdk/pull/6109 From thartmann at openjdk.java.net Tue Oct 26 15:55:31 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 26 Oct 2021 15:55:31 GMT Subject: RFR: 8275975: Remove dead code in ciInstanceKlass Message-ID: <2ct2xPwc-eeigFf3aH-0Ilq6tpVaE-CZkZNFE8JxWs0=.5145d628-fc60-4a90-b46a-eda86b69be1c@github.com> I've noticed some dead code in `ciInstanceKlass` (and also in `ciArrayKlass`). Mostly leftovers from [JDK-8237767](https://bugs.openjdk.java.net/browse/JDK-8237767). Thanks, Tobias ------------- Commit messages: - 8275975: Remove dead code in ciInstanceKlass Changes: https://git.openjdk.java.net/jdk/pull/6118/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6118&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275975 Stats: 15 lines in 3 files changed: 0 ins; 15 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6118.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6118/head:pull/6118 PR: https://git.openjdk.java.net/jdk/pull/6118 From roland at openjdk.java.net Tue Oct 26 15:57:17 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 26 Oct 2021 15:57:17 GMT Subject: Integrated: 8259609: C2: optimize long range checks in long counted loops In-Reply-To: References: Message-ID: <8EK8xQfJxcUaAG1tuK8QBtvtTx4iDD8xiLeUq8Q9-9I=.e5026c04-6e9f-4884-a3a6-40795b36b392@github.com> On Tue, 12 Jan 2021 10:15:01 GMT, Roland Westrelin wrote: > JDK-8255150 makes it possible for java code to explicitly perform a > range check on long values. JDK-8223051 provides a transformation of > long counted loops into loop nests with an inner int counted > loop. With this change I propose transforming long range checks that > operate on the iv of a long counted loop into range checks that > operate on the iv of the int inner loop once it has been > created. Existing range check eliminations can then kick in. > > Transformation of range checks is piggy backed on the loop nest > creation for 2 reasons: > > - pattern matching range checks is easier right before the loop nest > is created > > - the number of iterations of the inner loop is adjusted so scale * > inner_iv doesn't overflow > > C2 has logic to delay some split if transformations so they don't > break the scale * iv + offset pattern. I reused that logic for long > range checks and had to relax what's considered a range check because > initially a range check from Object.checkIndex() may include a test > for range > 0 that needs a round of loop opts to be hoisted. I realize > there's some code duplication but I didn't see a way to share logic > between IdealLoopTree::may_have_range_check() > IdealLoopTree::policy_range_check() that would feel right. > > I realize the comment in PhaseIdealLoop::transform_long_range_checks() > is scary. FWIW, it's not as complicated as it looks. I found drawing > the range covered by the entire long loop and the range covered by the > inner loop help see how range checks can be transformed. Then the > comment helps make sure all cases are covered and verify the generated > code actually covers all of them. > > One issue is overflow. I think the fact that inner_iv * scale doesn't > overflow helps simplify thing. One possible overflow is that of scale > * upper + offset which is handled by forcing all range checks in that > case to deoptimize. I don't think other case of overflow needs special > handling. > > This was tested with a Memory Segment micro benchmark (and patched > Memory Segment support to take advantage of the new checkIndex > intrinsic, both provided by Maurizio). Range checks in the micro > benchmark are properly optimized (and performance increases > significantly). This pull request has now been integrated. Changeset: 82f4aacb Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/82f4aacb42e60e9cd00e199703a869e7ad4465ff Stats: 975 lines in 13 files changed: 800 ins; 67 del; 108 mod 8259609: C2: optimize long range checks in long counted loops Co-authored-by: John R Rose Reviewed-by: thartmann, jrose ------------- PR: https://git.openjdk.java.net/jdk/pull/2045 From kvn at openjdk.java.net Tue Oct 26 16:28:18 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 26 Oct 2021 16:28:18 GMT Subject: RFR: JDK-8275865: Print deoptimization statistics in product builds In-Reply-To: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> References: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> Message-ID: On Mon, 25 Oct 2021 11:46:06 GMT, Volker Simonis wrote: > Deoptimization statistics are already gathered in product builds but for some (probably historical) reasons aren't printed to the VM/Compiler log. These statics can be useful when analyzing the reasons for deoptimization and frequent recompilations. > > Because the statistics are already collected anyway, printing them at VM-exit if either `-XX:+LogCompilation` or `-XX:+LogVMOutput` are set won't introduce any runtime overhead. src/hotspot/share/runtime/java.cpp line 355: > 353: } > 354: > 355: #if defined(COMPILER1) || defined(COMPILER2) || defined(INCLUDE_JVMCI) Deoptimization statistic is collected and printed only for C2 and JVMCI otherwise it is empty: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/deoptimization.cpp#L2664 There are calls to `Deoptimization::print_statistics()` at lines #234 and #244: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/java.cpp#L229 In Tiered VM the call at line #234 will be executed when `LogVMOutput || LogCompilation` is true. But not when only `PrintOptoStatistics` is true. Which is bug. My suggestion is to remove call from line #234 and move call at #244 from under `#ifndef COMPILER1` guard. ------------- PR: https://git.openjdk.java.net/jdk/pull/6103 From chagedorn at openjdk.java.net Tue Oct 26 17:08:14 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 26 Oct 2021 17:08:14 GMT Subject: RFR: 8275975: Remove dead code in ciInstanceKlass In-Reply-To: <2ct2xPwc-eeigFf3aH-0Ilq6tpVaE-CZkZNFE8JxWs0=.5145d628-fc60-4a90-b46a-eda86b69be1c@github.com> References: <2ct2xPwc-eeigFf3aH-0Ilq6tpVaE-CZkZNFE8JxWs0=.5145d628-fc60-4a90-b46a-eda86b69be1c@github.com> Message-ID: On Tue, 26 Oct 2021 15:46:41 GMT, Tobias Hartmann wrote: > I've noticed some dead code in `ciInstanceKlass` (and also in `ciArrayKlass`). Mostly leftovers from [JDK-8237767](https://bugs.openjdk.java.net/browse/JDK-8237767). > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6118 From kvn at openjdk.java.net Tue Oct 26 17:32:10 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 26 Oct 2021 17:32:10 GMT Subject: RFR: 8275975: Remove dead code in ciInstanceKlass In-Reply-To: <2ct2xPwc-eeigFf3aH-0Ilq6tpVaE-CZkZNFE8JxWs0=.5145d628-fc60-4a90-b46a-eda86b69be1c@github.com> References: <2ct2xPwc-eeigFf3aH-0Ilq6tpVaE-CZkZNFE8JxWs0=.5145d628-fc60-4a90-b46a-eda86b69be1c@github.com> Message-ID: On Tue, 26 Oct 2021 15:46:41 GMT, Tobias Hartmann wrote: > I've noticed some dead code in `ciInstanceKlass` (and also in `ciArrayKlass`). Mostly leftovers from [JDK-8237767](https://bugs.openjdk.java.net/browse/JDK-8237767). > > Thanks, > Tobias Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6118 From kvn at openjdk.java.net Tue Oct 26 17:35:11 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 26 Oct 2021 17:35:11 GMT Subject: RFR: 8273585: String.charAt performance degrades due to JDK-8268698 [v2] In-Reply-To: References: Message-ID: On Tue, 26 Oct 2021 02:14:42 GMT, Yi Yang wrote: >> String.charAt shows significant performance regression due to [JDK-8268698](https://bugs.openjdk.java.net/browse/JDK-8268698), which replaces index bound checking with Preconditions.checkIndex intrinsic method. >> >> The result of "time linux-x86_64-server-release/images/jdk/bin/java Test": >> >> >> Before JDK-8268698 >> real 0m8.369s >> user 0m8.386s >> sys 0m0.019s >> >> After JDK-8268698, >> real 0m19.722s >> user 0m19.748s >> sys 0m0.013s >> >> >> The reason is Preconditions.checkIndex generates a CastII for index node as index is now known to be >= 0 and < length.: >> >> https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/library_call.cpp#L1077-L1083 >> >> CastII can not be recognized as a parallel induction variable because AddNode's input must be the PhiNode: >> >> https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/loopnode.cpp#L3177-L3184 >> >> It seems this prevents further loop unrolling. I think we can relax this constraint, i.e CastII can be the input of AddNode if its input is PhiNode. After applying this patch, performance regression disappears: >> >> >> $time ./test.sh >> >> real 0m9.514s >> user 0m10.310s >> sys 0m0.155s >> >> This is likely the reason for [JDK-8272493](https://bugs.openjdk.java.net/browse/JDK-8272493). Please help review it. Thanks! > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > use uncast Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6096 From dnsimon at openjdk.java.net Tue Oct 26 18:53:19 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Tue, 26 Oct 2021 18:53:19 GMT Subject: Integrated: 8275874: [JVMCI] only support aligned reads in c2v_readFieldValue In-Reply-To: References: Message-ID: On Mon, 25 Oct 2021 14:33:27 GMT, Doug Simon wrote: > [JDK-8275645](https://bugs.openjdk.java.net/browse/JDK-8275645) resulted in loosing single-copy atomicity for reads in `c2v_readFieldValue`. This PR fixes that only doing aligned reads in `c2v_readFieldValue`. This pull request has now been integrated. Changeset: 2448b3f5 Author: Doug Simon URL: https://git.openjdk.java.net/jdk/commit/2448b3f5f96ec4d9ea8fe9dae32a0aab725fb4ad Stats: 52 lines in 5 files changed: 21 ins; 19 del; 12 mod 8275874: [JVMCI] only support aligned reads in c2v_readFieldValue Reviewed-by: never, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/6109 From redestad at openjdk.java.net Wed Oct 27 00:36:11 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 27 Oct 2021 00:36:11 GMT Subject: RFR: 8273585: String.charAt performance degrades due to JDK-8268698 [v2] In-Reply-To: References: Message-ID: On Tue, 26 Oct 2021 02:14:42 GMT, Yi Yang wrote: >> String.charAt shows significant performance regression due to [JDK-8268698](https://bugs.openjdk.java.net/browse/JDK-8268698), which replaces index bound checking with Preconditions.checkIndex intrinsic method. >> >> The result of "time linux-x86_64-server-release/images/jdk/bin/java Test": >> >> >> Before JDK-8268698 >> real 0m8.369s >> user 0m8.386s >> sys 0m0.019s >> >> After JDK-8268698, >> real 0m19.722s >> user 0m19.748s >> sys 0m0.013s >> >> >> The reason is Preconditions.checkIndex generates a CastII for index node as index is now known to be >= 0 and < length.: >> >> https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/library_call.cpp#L1077-L1083 >> >> CastII can not be recognized as a parallel induction variable because AddNode's input must be the PhiNode: >> >> https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/loopnode.cpp#L3177-L3184 >> >> It seems this prevents further loop unrolling. I think we can relax this constraint, i.e CastII can be the input of AddNode if its input is PhiNode. After applying this patch, performance regression disappears: >> >> >> $time ./test.sh >> >> real 0m9.514s >> user 0m10.310s >> sys 0m0.155s >> >> This is likely the reason for [JDK-8272493](https://bugs.openjdk.java.net/browse/JDK-8272493). Please help review it. Thanks! > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > use uncast > This is likely the reason for JDK-8272493. While a good fix to an issue that seems more likely to be of real concern, this does not seem to remedy the comparatively minor performance issue reported by JDK-8272493 ------------- PR: https://git.openjdk.java.net/jdk/pull/6096 From yyang at openjdk.java.net Wed Oct 27 01:27:12 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 27 Oct 2021 01:27:12 GMT Subject: RFR: 8273585: String.charAt performance degrades due to JDK-8268698 [v2] In-Reply-To: References: Message-ID: On Wed, 27 Oct 2021 00:33:37 GMT, Claes Redestad wrote: >> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> use uncast > >> This is likely the reason for JDK-8272493. > > While a good fix to an issue that seems more likely to be of real concern, this does not seem to remedy the comparatively minor performance issue reported by JDK-8272493 @cl4es Thanks for confirmation. I'd like to investigate JDK-8272493 later to see what's the real cause for that one. ------------- PR: https://git.openjdk.java.net/jdk/pull/6096 From yyang at openjdk.java.net Wed Oct 27 01:27:12 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 27 Oct 2021 01:27:12 GMT Subject: Integrated: 8273585: String.charAt performance degrades due to JDK-8268698 In-Reply-To: References: Message-ID: On Mon, 25 Oct 2021 05:52:02 GMT, Yi Yang wrote: > String.charAt shows significant performance regression due to [JDK-8268698](https://bugs.openjdk.java.net/browse/JDK-8268698), which replaces index bound checking with Preconditions.checkIndex intrinsic method. > > The result of "time linux-x86_64-server-release/images/jdk/bin/java Test": > > > Before JDK-8268698 > real 0m8.369s > user 0m8.386s > sys 0m0.019s > > After JDK-8268698, > real 0m19.722s > user 0m19.748s > sys 0m0.013s > > > The reason is Preconditions.checkIndex generates a CastII for index node as index is now known to be >= 0 and < length.: > > https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/library_call.cpp#L1077-L1083 > > CastII can not be recognized as a parallel induction variable because AddNode's input must be the PhiNode: > > https://github.com/openjdk/jdk/blob/5dab76b939e381312ce5c89b9aebca628238a387/src/hotspot/share/opto/loopnode.cpp#L3177-L3184 > > It seems this prevents further loop unrolling. I think we can relax this constraint, i.e CastII can be the input of AddNode if its input is PhiNode. After applying this patch, performance regression disappears: > > > $time ./test.sh > > real 0m9.514s > user 0m10.310s > sys 0m0.155s > > This is likely the reason for [JDK-8272493](https://bugs.openjdk.java.net/browse/JDK-8272493). Please help review it. Thanks! This pull request has now been integrated. Changeset: b0d1e4ff Author: Yi Yang URL: https://git.openjdk.java.net/jdk/commit/b0d1e4ff4d3806851fe998717822e8e52987357c Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8273585: String.charAt performance degrades due to JDK-8268698 Reviewed-by: roland, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6096 From whuang at openjdk.java.net Wed Oct 27 05:36:19 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Wed, 27 Oct 2021 05:36:19 GMT Subject: Integrated: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend In-Reply-To: References: Message-ID: On Tue, 20 Jul 2021 14:55:42 GMT, Wang Huang wrote: > * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. > * It may be a solver of JDK-8269866, or part of it. This pull request has now been integrated. Changeset: 9f75d5ce Author: Wang Huang Committer: Ningsheng Jian URL: https://git.openjdk.java.net/jdk/commit/9f75d5ce500886b32175cc541939b7f0eee190ca Stats: 539 lines in 6 files changed: 276 ins; 68 del; 195 mod 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend Co-authored-by: Wang Huang Co-authored-by: Wu Yan Co-authored-by: Miao Zhuojun Reviewed-by: aph, eliu, njian ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From thartmann at openjdk.java.net Wed Oct 27 06:21:13 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 27 Oct 2021 06:21:13 GMT Subject: RFR: 8275975: Remove dead code in ciInstanceKlass In-Reply-To: <2ct2xPwc-eeigFf3aH-0Ilq6tpVaE-CZkZNFE8JxWs0=.5145d628-fc60-4a90-b46a-eda86b69be1c@github.com> References: <2ct2xPwc-eeigFf3aH-0Ilq6tpVaE-CZkZNFE8JxWs0=.5145d628-fc60-4a90-b46a-eda86b69be1c@github.com> Message-ID: <__nMG07M7hZxagJOb0B0fu3u8W5rfDU0pZPvySfEDbo=.907d7576-b559-427c-8a83-3280f5c49db1@github.com> On Tue, 26 Oct 2021 15:46:41 GMT, Tobias Hartmann wrote: > I've noticed some dead code in `ciInstanceKlass` (and also in `ciArrayKlass`). Mostly leftovers from [JDK-8237767](https://bugs.openjdk.java.net/browse/JDK-8237767). > > Thanks, > Tobias Christian, Vladimir, thanks for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6118 From ngasson at openjdk.java.net Wed Oct 27 06:33:30 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 27 Oct 2021 06:33:30 GMT Subject: RFR: 8275847: Scheduling fails with "too many D-U pinch points" on small method Message-ID: <5z-HFwTvcqo_tge6dIMU4VZo-0UkInXAIKuh5D-fkxI=.b5f7a31b-717f-47b5-bfcc-90dbb223075e@github.com> Since around JDK 16 the following method cannot be compiled by C2 on AArch64: public double mergeSync() { return Math.log(Math.sin(value)); } (Reduced from a slightly larger benchmark.) 811 416 ! 3 Test::mergeSync (61 bytes) 813 417 ! 4 Test::mergeSync (61 bytes) 816 417 ! 4 Test::mergeSync (61 bytes) COMPILE SKIPPED: too many D-U pinch points (retry at different tier) 816 418 ! 1 Test::mergeSync (61 bytes) Scheduling::anti_do_def() will create temporary Nodes for each OptoReg killed by the MachProjs from the two runtime leaf calls. After SVE support was added these runtime calls kill more registers, and the number of new Nodes added by anti_do_def exceeds an internal limit (which is based on the LRG map size and roughly proportional to the method size). X86 has the same problem if OptoScheduling is enabled because of the wide AVX registers. The fix here is to ignore OptoRegs which correspond to the high slots of wide vectors (i.e. slots above 64 bits). The scheduler doesn't run on methods where C->max_vector_size() > 8, so we know these kills can't affect the scheduling result. The added test fails on the current JDK with: compiler.lib.ir_framework.shared.TestRunException: Could not compile public double compiler.c2.irTests.TestScheduleSmallMethod.testSmallMethodTwoRuntimeCalls(double) at level C2 after 10s. Last compilation level: 3 ------------- Commit messages: - 8275847: Scheduling fails with "too many D-U pinch points" on small method Changes: https://git.openjdk.java.net/jdk/pull/6131/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6131&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275847 Stats: 75 lines in 3 files changed: 74 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6131.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6131/head:pull/6131 PR: https://git.openjdk.java.net/jdk/pull/6131 From stuefe at openjdk.java.net Wed Oct 27 08:44:22 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 27 Oct 2021 08:44:22 GMT Subject: RFR: JDK-8276046: codestrings.validate_vm gtest fails on ppc64, s390 Message-ID: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> Trivial patch to switch off the associated gtest on these platforms. PPC64 and s390 compilers don't use code strings. ------------- Commit messages: - disable codestrings gtest on s390, ppc Changes: https://git.openjdk.java.net/jdk/pull/6133/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6133&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276046 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6133.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6133/head:pull/6133 PR: https://git.openjdk.java.net/jdk/pull/6133 From shade at openjdk.java.net Wed Oct 27 08:52:20 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 27 Oct 2021 08:52:20 GMT Subject: RFR: JDK-8276046: codestrings.validate_vm gtest fails on ppc64, s390 In-Reply-To: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> References: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> Message-ID: On Wed, 27 Oct 2021 08:27:12 GMT, Thomas Stuefe wrote: > Trivial patch to switch off the associated gtest on these platforms. PPC64 and s390 compilers don't use code strings. It looks okay to me, but do you want to join the family of `#ifdefs` at the beginning of the file? ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6133 From stuefe at openjdk.java.net Wed Oct 27 09:06:32 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 27 Oct 2021 09:06:32 GMT Subject: RFR: JDK-8276046: codestrings.validate_vm gtest fails on ppc64, s390 [v2] In-Reply-To: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> References: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> Message-ID: > Trivial patch to switch off the associated gtest on these platforms. PPC64 and s390 compilers don't use code strings. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: request aleksey ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6133/files - new: https://git.openjdk.java.net/jdk/pull/6133/files/0f3e2cf3..f651b6ae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6133&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6133&range=00-01 Stats: 10 lines in 1 file changed: 5 ins; 5 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6133.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6133/head:pull/6133 PR: https://git.openjdk.java.net/jdk/pull/6133 From mdoerr at openjdk.java.net Wed Oct 27 09:06:33 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 27 Oct 2021 09:06:33 GMT Subject: RFR: JDK-8276046: codestrings.validate_vm gtest fails on ppc64, s390 [v2] In-Reply-To: References: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> Message-ID: On Wed, 27 Oct 2021 09:03:03 GMT, Thomas Stuefe wrote: >> Trivial patch to switch off the associated gtest on these platforms. PPC64 and s390 compilers don't use code strings. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > request aleksey Thanks for fixing the issue! Looks ok to me, too. Aleksey's suggestion to have the platform preprocessor stuff at the beginning sounds good. Even though I don't know how this should be done. In addition, wouldn't it be better to use positive instead of negative tests for platforms which do use code strings? ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6133 From stuefe at openjdk.java.net Wed Oct 27 09:06:33 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 27 Oct 2021 09:06:33 GMT Subject: RFR: JDK-8276046: codestrings.validate_vm gtest fails on ppc64, s390 In-Reply-To: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> References: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> Message-ID: On Wed, 27 Oct 2021 08:27:12 GMT, Thomas Stuefe wrote: > Trivial patch to switch off the associated gtest on these platforms. PPC64 and s390 compilers don't use code strings. Okay, I followed Alekseys' advice. I rather keep the negatives explicit though. The "DISABLED_" prefix seems to be the standard way to disable gtests, but I really don't care. Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/6133 From simonis at openjdk.java.net Wed Oct 27 11:43:47 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Wed, 27 Oct 2021 11:43:47 GMT Subject: RFR: JDK-8275865: Print deoptimization statistics in product builds [v2] In-Reply-To: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> References: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> Message-ID: > Deoptimization statistics are already gathered in product builds but for some (probably historical) reasons aren't printed to the VM/Compiler log. These statics can be useful when analyzing the reasons for deoptimization and frequent recompilations. > > Because the statistics are already collected anyway, printing them at VM-exit if either `-XX:+LogCompilation` or `-XX:+LogVMOutput` are set won't introduce any runtime overhead. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Fix C1 case which doesn't print deoptimization statistics ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6103/files - new: https://git.openjdk.java.net/jdk/pull/6103/files/9b6bc6fe..42f01aed Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6103&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6103&range=00-01 Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6103.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6103/head:pull/6103 PR: https://git.openjdk.java.net/jdk/pull/6103 From simonis at openjdk.java.net Wed Oct 27 11:43:50 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Wed, 27 Oct 2021 11:43:50 GMT Subject: RFR: JDK-8275865: Print deoptimization statistics in product builds [v2] In-Reply-To: References: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> Message-ID: <2tQ_wETcI4l_xx5P6qRzIFzOdmbeWrWHpSav-CWZq6Q=.20b40447-ea63-46e1-ac19-b655f7b1f7ba@github.com> On Tue, 26 Oct 2021 16:24:41 GMT, Vladimir Kozlov wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix C1 case which doesn't print deoptimization statistics > > src/hotspot/share/runtime/java.cpp line 355: > >> 353: } >> 354: >> 355: #if defined(COMPILER1) || defined(COMPILER2) || defined(INCLUDE_JVMCI) > > Deoptimization statistic is collected and printed only for C2 and JVMCI otherwise it is empty: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/deoptimization.cpp#L2664 > > There are calls to `Deoptimization::print_statistics()` at lines #234 and #244: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/java.cpp#L229 > > In Tiered VM the call at line #234 will be executed when `LogVMOutput || LogCompilation` is true. But not when only `PrintOptoStatistics` is true. Which is bug. > > My suggestion is to remove call from line #234 and move call at #244 from under `#ifndef COMPILER1` guard. Hi Vladimir, thanks for looking into this PR. My initial condition was based on the corresponding condition in the debug branch. But you are completely right with your observation so I've changed the condition and the code in the debug branch like you've suggested. Thank you and best regards, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/6103 From stuefe at openjdk.java.net Wed Oct 27 11:58:08 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 27 Oct 2021 11:58:08 GMT Subject: RFR: JDK-8276046: codestrings.validate_vm gtest fails on ppc64, s390 [v2] In-Reply-To: References: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> Message-ID: On Wed, 27 Oct 2021 08:59:38 GMT, Martin Doerr wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> request aleksey > > Thanks for fixing the issue! Looks ok to me, too. Aleksey's suggestion to have the platform preprocessor stuff at the beginning sounds good. Even though I don't know how this should be done. In addition, wouldn't it be better to use positive instead of negative tests for platforms which do use code strings? @TheRealMDoerr Martin, are you okay with this? ------------- PR: https://git.openjdk.java.net/jdk/pull/6133 From thartmann at openjdk.java.net Wed Oct 27 12:31:19 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 27 Oct 2021 12:31:19 GMT Subject: Integrated: 8275975: Remove dead code in ciInstanceKlass In-Reply-To: <2ct2xPwc-eeigFf3aH-0Ilq6tpVaE-CZkZNFE8JxWs0=.5145d628-fc60-4a90-b46a-eda86b69be1c@github.com> References: <2ct2xPwc-eeigFf3aH-0Ilq6tpVaE-CZkZNFE8JxWs0=.5145d628-fc60-4a90-b46a-eda86b69be1c@github.com> Message-ID: On Tue, 26 Oct 2021 15:46:41 GMT, Tobias Hartmann wrote: > I've noticed some dead code in `ciInstanceKlass` (and also in `ciArrayKlass`). Mostly leftovers from [JDK-8237767](https://bugs.openjdk.java.net/browse/JDK-8237767). > > Thanks, > Tobias This pull request has now been integrated. Changeset: a2927333 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/a29273336bae75e8d185fa7f7c789acbec50a619 Stats: 15 lines in 3 files changed: 0 ins; 15 del; 0 mod 8275975: Remove dead code in ciInstanceKlass Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6118 From mdoerr at openjdk.java.net Wed Oct 27 12:42:14 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 27 Oct 2021 12:42:14 GMT Subject: RFR: JDK-8276046: codestrings.validate_vm gtest fails on ppc64, s390 [v2] In-Reply-To: References: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> Message-ID: On Wed, 27 Oct 2021 09:06:32 GMT, Thomas Stuefe wrote: >> Trivial patch to switch off the associated gtest on these platforms. PPC64 and s390 compilers don't use code strings. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > request aleksey I'm ok with it, but I'd prefer less `ifndef`s. ZERO, PPC and S390 could be covered by one Iine. Or even better something like `#if defined(X86) || defined(ARM)`. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6133 From thartmann at openjdk.java.net Wed Oct 27 12:50:13 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 27 Oct 2021 12:50:13 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE In-Reply-To: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: On Fri, 22 Oct 2021 00:34:03 GMT, TatWai Chong wrote: > After JDK-8269559 was integrated there are failures in tier1 testing > across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. > > This patch is NOT functional; rather, this tends to verify potential > toolchain issues as the original patch pass testing on other > platforms. > > In this patch, we remove new SVE-related matching rules and register > class introduced in the original patch to minimally affect the > non-SVE part. All green. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From stuefe at openjdk.java.net Wed Oct 27 13:18:08 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 27 Oct 2021 13:18:08 GMT Subject: RFR: JDK-8276046: codestrings.validate_vm gtest fails on ppc64, s390 [v2] In-Reply-To: References: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> Message-ID: On Wed, 27 Oct 2021 12:38:40 GMT, Martin Doerr wrote: > I'm ok with it, but I'd prefer less `ifndef`s. ZERO, PPC and S390 could be covered by one Iine. Or even better something like `#if defined(X86) || defined(ARM)`. ZERO cannot be following the same logic since it is orthogonal to the CPU architecture. E.g. zero on x86 builds with `-DAMD64 -DZERO`, and `AMD64` causes `X86` to be defined: https://github.com/openjdk/jdk/blob/168081efc8af1f5d1d7524246eb4a0675bd49ae0/src/hotspot/share/utilities/macros.hpp#L447-L455 so you need to exclude ZERO independently from handling CPU architectures. ARM is only 32bit arm: https://github.com/openjdk/jdk/blob/168081efc8af1f5d1d7524246eb4a0675bd49ae0/src/hotspot/share/utilities/macros.hpp#L529-L531 So it would have to be at least `#if defined(X86) || defined(ARM) || defined(AARCH64)`. And downstream porters of different platforms (mips, riscv, sparc) would then have to opt in and extend that construct with their own macros. Which nobody would do probably since you would need to know it exists. So I prefer to keep the negative here. Also makes more sense to have explicit exclusions, since there are concrete reasons for exclusions vs having concrete reasons for doing this test. So, are we good with: #ifndef PRODUCT #ifndef ZERO // Neither ppc nor s390 compilers use code strings. #if !defined(PPC) && !defined(S390) ? ------------- PR: https://git.openjdk.java.net/jdk/pull/6133 From mdoerr at openjdk.java.net Wed Oct 27 13:23:15 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 27 Oct 2021 13:23:15 GMT Subject: RFR: JDK-8276046: codestrings.validate_vm gtest fails on ppc64, s390 [v2] In-Reply-To: References: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> Message-ID: <48AJjcnKdtbDEs3P9jfwclbKn4qOE95n-JTretkFl4Y=.b233f535-6f4f-4c11-b516-c6989d4b61a9@github.com> On Wed, 27 Oct 2021 09:06:32 GMT, Thomas Stuefe wrote: >> Trivial patch to switch off the associated gtest on these platforms. PPC64 and s390 compilers don't use code strings. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > request aleksey `#if !defined(ZERO) && !defined(PPC) && !defined(S390)` would be equivalent, but your version is fine, too. ------------- PR: https://git.openjdk.java.net/jdk/pull/6133 From stuefe at openjdk.java.net Wed Oct 27 13:44:40 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 27 Oct 2021 13:44:40 GMT Subject: RFR: JDK-8276046: codestrings.validate_vm gtest fails on ppc64, s390 [v3] In-Reply-To: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> References: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> Message-ID: > Trivial patch to switch off the associated gtest on these platforms. PPC64 and s390 compilers don't use code strings. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: request martin ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6133/files - new: https://git.openjdk.java.net/jdk/pull/6133/files/f651b6ae..0d4108b8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6133&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6133&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6133.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6133/head:pull/6133 PR: https://git.openjdk.java.net/jdk/pull/6133 From mdoerr at openjdk.java.net Wed Oct 27 13:44:42 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 27 Oct 2021 13:44:42 GMT Subject: RFR: JDK-8276046: codestrings.validate_vm gtest fails on ppc64, s390 [v3] In-Reply-To: References: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> Message-ID: On Wed, 27 Oct 2021 13:41:16 GMT, Thomas Stuefe wrote: >> Trivial patch to switch off the associated gtest on these platforms. PPC64 and s390 compilers don't use code strings. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > request martin Thanks for the update! ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6133 From stuefe at openjdk.java.net Wed Oct 27 13:44:45 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 27 Oct 2021 13:44:45 GMT Subject: RFR: JDK-8276046: codestrings.validate_vm gtest fails on ppc64, s390 [v2] In-Reply-To: References: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> Message-ID: On Wed, 27 Oct 2021 09:06:32 GMT, Thomas Stuefe wrote: >> Trivial patch to switch off the associated gtest on these platforms. PPC64 and s390 compilers don't use code strings. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > request aleksey Thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/6133 From stuefe at openjdk.java.net Wed Oct 27 13:44:47 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 27 Oct 2021 13:44:47 GMT Subject: Integrated: JDK-8276046: codestrings.validate_vm gtest fails on ppc64, s390 In-Reply-To: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> References: <09p-YM_5Hm9neizGLdPVyl5syiH5e23g5z7s85ayw3w=.74ce5acd-541f-49fe-b82b-df7e15b88183@github.com> Message-ID: On Wed, 27 Oct 2021 08:27:12 GMT, Thomas Stuefe wrote: > Trivial patch to switch off the associated gtest on these platforms. PPC64 and s390 compilers don't use code strings. This pull request has now been integrated. Changeset: 809488bf Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/809488bf38c250db3c263f200e5eb1a269059c3d Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8276046: codestrings.validate_vm gtest fails on ppc64, s390 Reviewed-by: shade, mdoerr ------------- PR: https://git.openjdk.java.net/jdk/pull/6133 From jiefu at openjdk.java.net Wed Oct 27 14:36:41 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 27 Oct 2021 14:36:41 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance Message-ID: Hi all, I'd like to reset the value of `LoopPercentProfileLimit` (from 30 to the original 10) for x86 to fix performance degradation. We had observed that for the same Java App, the performance of x86 is slower than that of aarch64. But the x86's performance should not be so worse than the aarch64 according to some SPEC benchmark results. After some investigation, it seems that the slowness of x86 is caused by the different default settings of `LoopPercentProfileLimit` (30 for x86, but 10 for other platforms). If we change `LoopPercentProfileLimit` from 30 to 10, x86 would run faster. In JDK-8149421, `LoopPercentProfileLimit` [1] was first added and set to be 30 for x86 and 10 for other platforms. Logically, the default value of `LoopPercentProfileLimit` is 10 for all platforms even before JDK-8149421. This is because when `LoopPercentProfileLimit=10`, `10.0` [2] equals `100.0 / LoopPercentProfileLimit` [3]. So if we set `LoopPercentProfileLimit=10`, this unrolling rule [3] would be the same as the original design before JDK-8149421. One most important fact is that from the very beginning of OpenJDK source code, the default value of `LoopPercentProfileLimit` (logically) is 10 for all platforms. So I suggest resetting `LoopPercentProfileLimit` to the original value (10) for x86, just as other platforms. I've noted that the review thread mentioned that JDK-8149421 would be beneficial for some SPECjvm2008 benchmarks [4]. Then I run SPECjvm2008 with `LoopPercentProfileLimit=10` finding that there is no performance drop on x86. So it won't revert JDK-8149421's opts for SPECjvm2008. To show the potential improvement of this change, I've made a jmh test in the patch. Performance can be improved by 1.25x ~ 2.0x according to this micro benchmark. Any comments? Thanks. Best regards, Jie [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L908 [2] https://github.com/openjdk/jdk8u/blob/master/hotspot/src/share/vm/opto/loopTransform.cpp#L673 [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L903 [4] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021205.html ratio before after ------------- Commit messages: - 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance Changes: https://git.openjdk.java.net/jdk/pull/6142/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6142&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276066 Stats: 89 lines in 2 files changed: 88 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6142.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6142/head:pull/6142 PR: https://git.openjdk.java.net/jdk/pull/6142 From kvn at openjdk.java.net Wed Oct 27 16:14:17 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 27 Oct 2021 16:14:17 GMT Subject: RFR: JDK-8275865: Print deoptimization statistics in product builds [v2] In-Reply-To: References: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> Message-ID: On Wed, 27 Oct 2021 11:43:47 GMT, Volker Simonis wrote: >> Deoptimization statistics are already gathered in product builds but for some (probably historical) reasons aren't printed to the VM/Compiler log. These statics can be useful when analyzing the reasons for deoptimization and frequent recompilations. >> >> Because the statistics are already collected anyway, printing them at VM-exit if either `-XX:+LogCompilation` or `-XX:+LogVMOutput` are set won't introduce any runtime overhead. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Fix C1 case which doesn't print deoptimization statistics src/hotspot/share/runtime/java.cpp line 354: > 352: } > 353: > 354: #if defined(COMPILER2) || defined(INCLUDE_JVMCI) There is shorter version: `#if COMPILER2_OR_JVMCI` Otherwise changes are good. ------------- PR: https://git.openjdk.java.net/jdk/pull/6103 From iveresov at openjdk.java.net Wed Oct 27 19:30:11 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 27 Oct 2021 19:30:11 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled [v2] In-Reply-To: References: Message-ID: <_I3yL3LY1DEJ__lP7j-pAHBJxPsH6QGjLKTfDDsVBaI=.bd4b5989-b540-4dd6-8cf4-35e07638db08@github.com> On Fri, 22 Oct 2021 08:12:27 GMT, SUN Guoyun wrote: >> Hi all, >> Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: >> >>


>> One or more @IR rules failed:
>> 
>> Failed IR Rules (1)
>> ------------------
>> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>>     - failOn: Graph contains forbidden nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>>         Matched forbidden node:
>>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>>     - counts: Graph contains wrong number of nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>>         Expected 1 but found 0 nodes.
>> 
>>>>> Check stdout for compilation output of the failed methods
>> 
>> >> This is a patch to fix this problem. Please help review it. >> >> Thanks, >> Sun Guoyun > > SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: > > 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled 0.14 is oddly low. I think we need to get to the root of this and figure out why it doesn't create the MDO when it should. Try running the test with -XX:+PrintTieredEvents and grep for testMethodHandleCallWithCCP. See what's happening there. Look at the mdo counters that it prints. What are the total counters when does it starts profiling (the mdo counters start increasing)? ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From duke at openjdk.java.net Thu Oct 28 00:35:33 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Thu, 28 Oct 2021 00:35:33 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v2] In-Reply-To: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: > After JDK-8269559 was integrated there are failures in tier1 testing > across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. > > This patch is NOT functional; rather, this tends to verify potential > toolchain issues as the original patch pass testing on other > platforms. > > In this patch, we remove new SVE-related matching rules and register > class introduced in the original patch to minimally affect the > non-SVE part. TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: Add the register class and description for this SVE intrinsic. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6072/files - new: https://git.openjdk.java.net/jdk/pull/6072/files/4ad06d94..c173d9c4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6072&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6072&range=00-01 Stats: 156 lines in 3 files changed: 156 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6072.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6072/head:pull/6072 PR: https://git.openjdk.java.net/jdk/pull/6072 From duke at openjdk.java.net Thu Oct 28 03:47:29 2021 From: duke at openjdk.java.net (Fei Gao) Date: Thu, 28 Oct 2021 03:47:29 GMT Subject: RFR: 8275317: AArch64: Support some type conversion vectorization in SLP Message-ID: Current SLP vectorizer in C2 compiler doesn't support type conversion operations. But AArch64 has vector type conversion instructions in both NEON and SVE. The type conversion involves two kinds of scenarios, conversion between the same data sizes and conversion between different data sizes. If we want to support casts between different data sizes, we need to amend the code part for identifying adjacent memory references and the code part for justifying if the combination is profitable. I suppose it would be easier to review if we split the whole task to support type conversion into two separate patches, one for the same data sizes and the other one for different data sizes. The goal of this patch is just to support conversions within the same data size, including: int -> float float -> int long -> double double -> long A typical test case: for (int i = start; i < limit; i++) { b[i] = (float) a[i]; } To implement it, the patch completed the necessary instructions and matching rules in the backend and added implementation for SLP in the middle end. The percentage of performance uplift on aarch64 system: benchmark Mode Cnt percentage change [(After-Before)/Before] Metric VectorLoop.convertD2L avgt 15 -48.46% ns/op VectorLoop.convertF2I avgt 15 -55.67% ns/op VectorLoop.convertI2F avgt 15 -55.27% ns/op VectorLoop.convertL2D avgt 15 -48.75% ns/op ------------- Commit messages: - 8275317: AArch64: Support some type conversion vectorization in SLP Changes: https://git.openjdk.java.net/jdk/pull/6145/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6145&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275317 Stats: 229 lines in 5 files changed: 224 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/6145.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6145/head:pull/6145 PR: https://git.openjdk.java.net/jdk/pull/6145 From duke at openjdk.java.net Thu Oct 28 04:54:07 2021 From: duke at openjdk.java.net (TatWai Chong) Date: Thu, 28 Oct 2021 04:54:07 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: On Wed, 27 Oct 2021 12:47:38 GMT, Tobias Hartmann wrote: >> After JDK-8269559 was integrated there are failures in tier1 testing >> across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. >> >> This patch is NOT functional; rather, this tends to verify potential >> toolchain issues as the original patch pass testing on other >> platforms. >> >> In this patch, we remove new SVE-related matching rules and register >> class introduced in the original patch to minimally affect the >> non-SVE part. > > All green. @TobiHartmann Thanks for your help. I've just updated a new patch. I've just added more code from the original patch to this. Could you test this second patch as you did previously, please? We are completing this a chunk by a chunk so that we can spot which part causes the failure. I think we merely need a couple of patches to do so (this one inclusive). ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From thartmann at openjdk.java.net Thu Oct 28 06:36:09 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 28 Oct 2021 06:36:09 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v2] In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: On Thu, 28 Oct 2021 00:35:33 GMT, TatWai Chong wrote: >> After JDK-8269559 was integrated there are failures in tier1 testing >> across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. >> >> This patch is NOT functional; rather, this tends to verify potential >> toolchain issues as the original patch pass testing on other >> platforms. >> >> In this patch, we remove new SVE-related matching rules and register >> class introduced in the original patch to minimally affect the >> non-SVE part. > > TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: > > Add the register class and description for this SVE intrinsic. Sure, I'll re-run testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From thartmann at openjdk.java.net Thu Oct 28 06:42:20 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 28 Oct 2021 06:42:20 GMT Subject: RFR: JDK-8275865: Print deoptimization statistics in product builds [v2] In-Reply-To: References: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> Message-ID: On Wed, 27 Oct 2021 11:43:47 GMT, Volker Simonis wrote: >> Deoptimization statistics are already gathered in product builds but for some (probably historical) reasons aren't printed to the VM/Compiler log. These statics can be useful when analyzing the reasons for deoptimization and frequent recompilations. >> >> Because the statistics are already collected anyway, printing them at VM-exit if either `-XX:+LogCompilation` or `-XX:+LogVMOutput` are set won't introduce any runtime overhead. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Fix C1 case which doesn't print deoptimization statistics Changes requested by thartmann (Reviewer). test/hotspot/jtreg/runtime/logging/DeoptStats.java line 46: > 44: public class DeoptStats { > 45: > 46: static class Value { We use +4 space indentation for Java files. test/hotspot/jtreg/runtime/logging/DeoptStats.java line 67: > 65: verify(args); > 66: } > 67: else { Line break should be removed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6103 From thartmann at openjdk.java.net Thu Oct 28 06:51:11 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 28 Oct 2021 06:51:11 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v3] In-Reply-To: References: Message-ID: On Tue, 26 Oct 2021 01:42:41 GMT, ?? wrote: >> `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. >> >> For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. >> image >> >> >> In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. >> image >> >> There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: >> >> diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp >> index 38b40a6..31ff172 100644 >> --- a/src/hotspot/share/opto/ifnode.cpp >> +++ b/src/hotspot/share/opto/ifnode.cpp >> @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { >> } >> } >> >> + if (is_LongCountedLoopEnd()) { >> + set_req(0, dom->in(0)); >> + set_req(1, dom->in(1)); >> + dom->set_req(0, pre); >> + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); >> + Node* proj0 = raw_out(0); >> + Node* proj1 = raw_out(1); >> + Node* dom_proj0 = dom->raw_out(0); >> + Node* dom_proj1 = dom->raw_out(1); >> + dom_proj0->set_req(0, this); >> + dom_proj1->set_req(0, this); >> + proj0->set_req(0, dom); >> + proj1->set_req(0, dom); >> + } >> + >> if (bol->outcnt() == 0) { >> igvn->remove_dead_node(bol); // Kill the BoolNode. >> } >> diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp >> index 6f7e34d..7955722 100644 >> --- a/src/hotspot/share/opto/loopnode.cpp >> +++ b/src/hotspot/share/opto/loopnode.cpp >> @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List >> Node* back_control = head->in(LoopNode::LoopBackControl); >> >> // data nodes on back branch not supported >> - if (back_control->outcnt() > 1) { >> + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { >> return false; >> } > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Adjust the code style The problem with such large and non-targeted regression tests is that they won't work for long. Other changes to C2 and/or HotSpot will change timing, profile information, IR shape, optimization sequence or other factors such that the issue will not reproduce anymore with that test. Often, the test also does not reproduce the issue in older JDK versions that are affected as well. We therefore usually run `creduce --not-c` on our generated tests to simplify them (see [creduce](https://embed.cs.utah.edu/creduce/)). You might want to increase the number of loop iterations in the main method first and also add `-Xbatch`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6099 From thartmann at openjdk.java.net Thu Oct 28 07:25:14 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 28 Oct 2021 07:25:14 GMT Subject: RFR: 8273277: C2: Move conditional negation into rc_predicate [v2] In-Reply-To: References: Message-ID: On Tue, 19 Oct 2021 09:24:18 GMT, Nils Eliasson wrote: >> Hi, >> >> I need some feedback on this patch. This was reported from Tencent and found in internal testing about the same time. This patch is based on a a patch provided by Tencent. >> >> In some very specific circumstances we need to negate the range checks that we create in PhaseIdealLoop::loop_predication_impl_helper. This is done in three places, but that method also calls insert_initial_skeleton_predicate where this isn't taken into account. >> >> To simplify things I have moved the negation logic into rc_predicate. This should prevent us from missing this check again. >> >> I do have a concern that negating the condition of the rangecheck in the skeleton predicate, since the skeleton predicate will be used as a clone template, and some rangechecks optimizations seems to assume that range checks always have LT as the condidtion. On the other hand - it is a really uncommon scenario since we haven't failed here before. >> >> Feedback appreciated. >> >> Best regards, >> Nils > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Add test case > I do have a concern that negating the condition of the rangecheck in the skeleton predicate, since the skeleton predicate will be used as a clone template, and some rangechecks optimizations seems to assume that range checks always have LT as the condidtion. On the other hand - it is a really uncommon scenario since we haven't failed here before. Is there any specific code that you worry about? I think it should be fine because the purpose of copying and instantiating skeleton range check predicates is to guarantee that control/data paths die consistently when the main loop induction variable falls outside of the allowed range of an array access. But @rwestrel and @chhagedorn looked more into this recently. Can we add the test that Tencent found as well? Please update the affects versions in the bug report. ------------- PR: https://git.openjdk.java.net/jdk/pull/5987 From thartmann at openjdk.java.net Thu Oct 28 07:47:10 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 28 Oct 2021 07:47:10 GMT Subject: RFR: 8274328: C2: Redundant CFG edges fixup in block ordering In-Reply-To: References: Message-ID: On Sun, 26 Sep 2021 10:40:43 GMT, Yi Yang wrote: > I think Trace::fixup_blocks is redundant because PhaseCFG::fixup_flow will nevertheless fix up the CFG flow(i.e. flip successor blocks of IfNode) right after PhaseBlockLayout pass, we can remove this step when doing PhaseBlockLayout pass.(Testing: jtreg/compiler/c2, presubmit test) > > https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/compile.cpp#L2765 > > https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/block.cpp#L1679 > > https://github.com/openjdk/jdk/blob/5ec1cdcaf39229a7d2457313600b0dc2bf8c6453/src/hotspot/share/opto/block.cpp#L908-L916 That looks good to me but I'm not an expert in that code. I submitted some testing and it all passed. src/hotspot/share/opto/block.cpp line 916: > 914: ProjNode* tmp = proj0; > 915: proj0 = proj1; > 916: proj1 = tmp; `swap(proj0, proj1)` can be used here. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5705 From thartmann at openjdk.java.net Thu Oct 28 08:24:13 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 28 Oct 2021 08:24:13 GMT Subject: RFR: 8275847: Scheduling fails with "too many D-U pinch points" on small method In-Reply-To: <5z-HFwTvcqo_tge6dIMU4VZo-0UkInXAIKuh5D-fkxI=.b5f7a31b-717f-47b5-bfcc-90dbb223075e@github.com> References: <5z-HFwTvcqo_tge6dIMU4VZo-0UkInXAIKuh5D-fkxI=.b5f7a31b-717f-47b5-bfcc-90dbb223075e@github.com> Message-ID: On Wed, 27 Oct 2021 06:25:22 GMT, Nick Gasson wrote: > Since around JDK 16 the following method cannot be compiled by C2 on AArch64: > > > public double mergeSync() { return Math.log(Math.sin(value)); } > > > (Reduced from a slightly larger benchmark.) > > > 811 416 ! 3 Test::mergeSync (61 bytes) > 813 417 ! 4 Test::mergeSync (61 bytes) > 816 417 ! 4 Test::mergeSync (61 bytes) COMPILE SKIPPED: too many D-U pinch points (retry at different tier) > 816 418 ! 1 Test::mergeSync (61 bytes) > > > Scheduling::anti_do_def() will create temporary Nodes for each OptoReg killed by the MachProjs from the two runtime leaf calls. After SVE support was added these runtime calls kill more registers, and the number of new Nodes added by anti_do_def exceeds an internal limit (which is based on the LRG map size and roughly proportional to the method size). > > X86 has the same problem if OptoScheduling is enabled because of the wide AVX registers. > > The fix here is to ignore OptoRegs which correspond to the high slots of wide vectors (i.e. slots above 64 bits). The scheduler doesn't run on methods where C->max_vector_size() > 8, so we know these kills can't affect the scheduling result. > > The added test fails on the current JDK with: > > > compiler.lib.ir_framework.shared.TestRunException: Could not compile public double > compiler.c2.irTests.TestScheduleSmallMethod.testSmallMethodTwoRuntimeCalls(double) at level C2 > after 10s. Last compilation level: 3 While looking at the usages of `is_concrete`, I found that all current usages outside of asserts are dead: https://github.com/openjdk/jdk/blob/1750a6e2c06960b734f646018fc99b336bd966a5/src/hotspot/share/opto/buildOopMap.cpp#L234 https://github.com/openjdk/jdk/blob/1750a6e2c06960b734f646018fc99b336bd966a5/src/hotspot/share/opto/buildOopMap.cpp#L315 https://github.com/openjdk/jdk/blob/1750a6e2c06960b734f646018fc99b336bd966a5/src/hotspot/share/opto/buildOopMap.cpp#L320 https://github.com/openjdk/jdk/blob/1750a6e2c06960b734f646018fc99b336bd966a5/src/hotspot/share/opto/buildOopMap.cpp#L350 I think we should clean that up. src/hotspot/cpu/x86/vmreg_x86.hpp line 93: > 91: if (is_Register()) return true; > 92: #endif // AMD64 > 93: // Do not use is_XMMRegister() here as it depends on the UseAVX settting. Typo `settting` -> `setting` ------------- PR: https://git.openjdk.java.net/jdk/pull/6131 From thartmann at openjdk.java.net Thu Oct 28 08:39:11 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 28 Oct 2021 08:39:11 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance In-Reply-To: References: Message-ID: On Wed, 27 Oct 2021 14:26:49 GMT, Jie Fu wrote: > Hi all, > > I'd like to reset the value of `LoopPercentProfileLimit` (from 30 to the original 10) for x86 to fix performance degradation. > > We had observed that for the same Java App, the performance of x86 is slower than that of aarch64. > But the x86's performance should not be so worse than the aarch64 according to some SPEC benchmark results. > > After some investigation, it seems that the slowness of x86 is caused by the different default settings of `LoopPercentProfileLimit` (30 for x86, but 10 for other platforms). > If we change `LoopPercentProfileLimit` from 30 to 10, x86 would run faster. > > In JDK-8149421, `LoopPercentProfileLimit` [1] was first added and set to be 30 for x86 and 10 for other platforms. > Logically, the default value of `LoopPercentProfileLimit` is 10 for all platforms even before JDK-8149421. > This is because when `LoopPercentProfileLimit=10`, `10.0` [2] equals `100.0 / LoopPercentProfileLimit` [3]. > So if we set `LoopPercentProfileLimit=10`, this unrolling rule [3] would be the same as the original design before JDK-8149421. > > One most important fact is that from the very beginning of OpenJDK source code, the default value of `LoopPercentProfileLimit` (logically) is 10 for all platforms. > So I suggest resetting `LoopPercentProfileLimit` to the original value (10) for x86, just as other platforms. > > I've noted that the review thread mentioned that JDK-8149421 would be beneficial for some SPECjvm2008 benchmarks [4]. > Then I run SPECjvm2008 with `LoopPercentProfileLimit=10` finding that there is no performance drop on x86. > So it won't revert JDK-8149421's opts for SPECjvm2008. > > To show the potential improvement of this change, I've made a jmh test in the patch. > Performance can be improved by 1.25x ~ 2.0x according to this micro benchmark. > > Any comments? > > Thanks. > Best regards, > Jie > > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L908 > [2] https://github.com/openjdk/jdk8u/blob/master/hotspot/src/share/vm/opto/loopTransform.cpp#L673 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L903 > [4] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021205.html > > ratio > > before > > after Just for the record, `LoopPercentProfileLimit` was always set to `30` on x86: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/5fefcbeda616#l6.7 Or am I missing something? EDIT: Okay, I've seen your comment explaining the details only now. ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From roland at openjdk.java.net Thu Oct 28 08:45:15 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 28 Oct 2021 08:45:15 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v3] In-Reply-To: References: Message-ID: On Tue, 26 Oct 2021 01:42:41 GMT, ?? wrote: >> `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. >> >> For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. >> image >> >> >> In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. >> image >> >> There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: >> >> diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp >> index 38b40a6..31ff172 100644 >> --- a/src/hotspot/share/opto/ifnode.cpp >> +++ b/src/hotspot/share/opto/ifnode.cpp >> @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { >> } >> } >> >> + if (is_LongCountedLoopEnd()) { >> + set_req(0, dom->in(0)); >> + set_req(1, dom->in(1)); >> + dom->set_req(0, pre); >> + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); >> + Node* proj0 = raw_out(0); >> + Node* proj1 = raw_out(1); >> + Node* dom_proj0 = dom->raw_out(0); >> + Node* dom_proj1 = dom->raw_out(1); >> + dom_proj0->set_req(0, this); >> + dom_proj1->set_req(0, this); >> + proj0->set_req(0, dom); >> + proj1->set_req(0, dom); >> + } >> + >> if (bol->outcnt() == 0) { >> igvn->remove_dead_node(bol); // Kill the BoolNode. >> } >> diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp >> index 6f7e34d..7955722 100644 >> --- a/src/hotspot/share/opto/loopnode.cpp >> +++ b/src/hotspot/share/opto/loopnode.cpp >> @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List >> Node* back_control = head->in(LoopNode::LoopBackControl); >> >> // data nodes on back branch not supported >> - if (back_control->outcnt() > 1) { >> + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { >> return false; >> } > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Adjust the code style The optimization is valid AFAIU, so I don't think blocking it is the right fix. What about something like this: diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp index 38b40a68b1f..b28a382ebe7 100644 --- a/src/hotspot/share/opto/ifnode.cpp +++ b/src/hotspot/share/opto/ifnode.cpp @@ -1721,6 +1721,14 @@ Node* IfProjNode::Identity(PhaseGVN* phase) { // will cause this node to be reprocessed once the dead branch is killed. in(0)->outcnt() == 1))) { // IfNode control + if (in(0)->is_BaseCountedLoopEnd()) { + Node* head = unique_ctrl_out(); + if (head != NULL && head->is_BaseCountedLoop() && head->in(LoopNode::LoopBackControl) == this) { + Node* new_head = new LoopNode(head->in(LoopNode::EntryControl), this); + phase->is_IterGVN()->register_new_node_with_optimizer(new_head); + phase->is_IterGVN()->replace_node(head, new_head); + } + } return in(0)->in(0); } // no progress That prevents the crash and gives the loop another chance to be optimized. ------------- PR: https://git.openjdk.java.net/jdk/pull/6099 From thartmann at openjdk.java.net Thu Oct 28 08:55:13 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 28 Oct 2021 08:55:13 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance In-Reply-To: References: Message-ID: On Wed, 27 Oct 2021 14:26:49 GMT, Jie Fu wrote: > Hi all, > > I'd like to reset the value of `LoopPercentProfileLimit` (from 30 to the original 10) for x86 to fix performance degradation. > > We had observed that for the same Java App, the performance of x86 is slower than that of aarch64. > But the x86's performance should not be so worse than the aarch64 according to some SPEC benchmark results. > > After some investigation, it seems that the slowness of x86 is caused by the different default settings of `LoopPercentProfileLimit` (30 for x86, but 10 for other platforms). > If we change `LoopPercentProfileLimit` from 30 to 10, x86 would run faster. > > In JDK-8149421, `LoopPercentProfileLimit` [1] was first added and set to be 30 for x86 and 10 for other platforms. > Logically, the default value of `LoopPercentProfileLimit` is 10 for all platforms even before JDK-8149421. > This is because when `LoopPercentProfileLimit=10`, `10.0` [2] equals `100.0 / LoopPercentProfileLimit` [3]. > So if we set `LoopPercentProfileLimit=10`, this unrolling rule [3] would be the same as the original design before JDK-8149421. > > One most important fact is that from the very beginning of OpenJDK source code, the default value of `LoopPercentProfileLimit` (logically) is 10 for all platforms. > So I suggest resetting `LoopPercentProfileLimit` to the original value (10) for x86, just as other platforms. > > I've noted that the review thread mentioned that JDK-8149421 would be beneficial for some SPECjvm2008 benchmarks [4]. > Then I run SPECjvm2008 with `LoopPercentProfileLimit=10` finding that there is no performance drop on x86. > So it won't revert JDK-8149421's opts for SPECjvm2008. > > To show the potential improvement of this change, I've made a jmh test in the patch. > Performance can be improved by 1.25x ~ 2.0x according to this micro benchmark. > > Any comments? > > Thanks. > Best regards, > Jie > > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L908 > [2] https://github.com/openjdk/jdk8u/blob/master/hotspot/src/share/vm/opto/loopTransform.cpp#L673 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L903 > [4] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021205.html > > ratio > > before > > after I'll run this through our performance testing and report back. ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From jiefu at openjdk.java.net Thu Oct 28 09:00:14 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 28 Oct 2021 09:00:14 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 08:51:49 GMT, Tobias Hartmann wrote: > I'll run this through our performance testing and report back. Thanks @TobiHartmann . ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From duke at openjdk.java.net Thu Oct 28 09:43:11 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Thu, 28 Oct 2021 09:43:11 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled [v2] In-Reply-To: <_I3yL3LY1DEJ__lP7j-pAHBJxPsH6QGjLKTfDDsVBaI=.bd4b5989-b540-4dd6-8cf4-35e07638db08@github.com> References: <_I3yL3LY1DEJ__lP7j-pAHBJxPsH6QGjLKTfDDsVBaI=.bd4b5989-b540-4dd6-8cf4-35e07638db08@github.com> Message-ID: <7JW8RU2Xg2-GFDvU4wlVozxfN4YE5TK8iqL3e5OuYkI=.6b68a127-7f31-486f-9824-dbdff8d557f0@github.com> On Wed, 27 Oct 2021 19:27:14 GMT, Igor Veresov wrote: >> SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: >> >> 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled > > 0.14 is oddly low. I think we need to get to the root of this and figure out why it doesn't create the MDO when it should. Try running the test with -XX:+PrintTieredEvents and grep for testMethodHandleCallWithCCP. See what's happening there. Look at the mdo counters that it prints. What are the total counters when does it starts profiling (the mdo counters start increasing)? @veresov Through the parameter `-XX:+printtieredevents`, it can be found that under normal circumstances, when the invocation count is 256 of method `java.lang.invoke.LambdaForm$DMH::invokeStatic`, it is compiled into C3 and MDO is created. But when layered compilation is turned off (`-XX:-TieredCompilation`), the MDO is not created until the invocation count is 1664, which is too late. So is it more reasonable to lower the value of `Tier0ProfileingStartPercentage` or `Tier3InvocationThreshold`? ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From simonis at openjdk.java.net Thu Oct 28 10:31:42 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 28 Oct 2021 10:31:42 GMT Subject: RFR: JDK-8275865: Print deoptimization statistics in product builds [v3] In-Reply-To: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> References: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> Message-ID: <445eNoYMXmr8fy7jAS1bIP6teKLRyOQB844NMzOjWRc=.f8f108dc-0785-4ee3-baaa-e99decac4d51@github.com> > Deoptimization statistics are already gathered in product builds but for some (probably historical) reasons aren't printed to the VM/Compiler log. These statics can be useful when analyzing the reasons for deoptimization and frequent recompilations. > > Because the statistics are already collected anyway, printing them at VM-exit if either `-XX:+LogCompilation` or `-XX:+LogVMOutput` are set won't introduce any runtime overhead. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Simplified preprocessor condition and fixed indentation in test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6103/files - new: https://git.openjdk.java.net/jdk/pull/6103/files/42f01aed..cef9099a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6103&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6103&range=01-02 Stats: 36 lines in 2 files changed: 11 ins; 12 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/6103.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6103/head:pull/6103 PR: https://git.openjdk.java.net/jdk/pull/6103 From simonis at openjdk.java.net Thu Oct 28 10:31:47 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 28 Oct 2021 10:31:47 GMT Subject: RFR: JDK-8275865: Print deoptimization statistics in product builds [v2] In-Reply-To: References: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> Message-ID: On Wed, 27 Oct 2021 16:11:23 GMT, Vladimir Kozlov wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix C1 case which doesn't print deoptimization statistics > > src/hotspot/share/runtime/java.cpp line 354: > >> 352: } >> 353: >> 354: #if defined(COMPILER2) || defined(INCLUDE_JVMCI) > > There is shorter version: `#if COMPILER2_OR_JVMCI` > Otherwise changes are good. Thanks, fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6103 From simonis at openjdk.java.net Thu Oct 28 10:31:52 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 28 Oct 2021 10:31:52 GMT Subject: RFR: JDK-8275865: Print deoptimization statistics in product builds [v2] In-Reply-To: References: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> Message-ID: On Thu, 28 Oct 2021 06:38:59 GMT, Tobias Hartmann wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix C1 case which doesn't print deoptimization statistics > > test/hotspot/jtreg/runtime/logging/DeoptStats.java line 46: > >> 44: public class DeoptStats { >> 45: >> 46: static class Value { > > We use +4 space indentation for Java files. Sorry, you're right. I've hacked this test in Emacs with my default of 2 spaces :) > test/hotspot/jtreg/runtime/logging/DeoptStats.java line 67: > >> 65: verify(args); >> 66: } >> 67: else { > > Line break should be removed. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6103 From thartmann at openjdk.java.net Thu Oct 28 10:45:12 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 28 Oct 2021 10:45:12 GMT Subject: RFR: JDK-8275865: Print deoptimization statistics in product builds [v3] In-Reply-To: <445eNoYMXmr8fy7jAS1bIP6teKLRyOQB844NMzOjWRc=.f8f108dc-0785-4ee3-baaa-e99decac4d51@github.com> References: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> <445eNoYMXmr8fy7jAS1bIP6teKLRyOQB844NMzOjWRc=.f8f108dc-0785-4ee3-baaa-e99decac4d51@github.com> Message-ID: On Thu, 28 Oct 2021 10:31:42 GMT, Volker Simonis wrote: >> Deoptimization statistics are already gathered in product builds but for some (probably historical) reasons aren't printed to the VM/Compiler log. These statics can be useful when analyzing the reasons for deoptimization and frequent recompilations. >> >> Because the statistics are already collected anyway, printing them at VM-exit if either `-XX:+LogCompilation` or `-XX:+LogVMOutput` are set won't introduce any runtime overhead. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Simplified preprocessor condition and fixed indentation in test Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6103 From simonis at openjdk.java.net Thu Oct 28 11:25:11 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 28 Oct 2021 11:25:11 GMT Subject: RFR: JDK-8275865: Print deoptimization statistics in product builds [v3] In-Reply-To: References: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> <445eNoYMXmr8fy7jAS1bIP6teKLRyOQB844NMzOjWRc=.f8f108dc-0785-4ee3-baaa-e99decac4d51@github.com> Message-ID: On Thu, 28 Oct 2021 10:41:59 GMT, Tobias Hartmann wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplified preprocessor condition and fixed indentation in test > > Looks good. Thanks for the reviews @TobiHartmann & @vnkozlov ! ------------- PR: https://git.openjdk.java.net/jdk/pull/6103 From shade at openjdk.java.net Thu Oct 28 11:49:23 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 28 Oct 2021 11:49:23 GMT Subject: RFR: 8273416: C2: assert(false) failed: bad AD file after JDK-8252372 with UseSSE={0,1} Message-ID: See the bug report for reproducer and failure message. I think the newly added `CastDD`/`CastFF` nodes should handle the extended `regDPR`/`regFPR` (which includes FPU "registers") instead of just `regD`/`regF` to avoid this mismatch error when `UseSSE < 2`. Unfortunately, we cannot just use `reg*PR` operands in existing match rules, because those operands are defined as `UseSSE < 2`, and using them as operands and `ideal_regs()` would break the matching on `UseSSE >= 2`. Therefore I had to add another pair of matches. Additonal testing: - [x] Linux x86_32 `tier1` `-XX:UseAVX=0 -XX:UseSSE=0` - [x] Linux x86_32 `tier1` default - [x] Linux x86_64 `tier1` default ------------- Commit messages: - Specialize match rules for UseSSE < 2 - Attempt to fix Changes: https://git.openjdk.java.net/jdk/pull/5386/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5386&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273416 Stats: 22 lines in 2 files changed: 20 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5386.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5386/head:pull/5386 PR: https://git.openjdk.java.net/jdk/pull/5386 From jbhateja at openjdk.java.net Thu Oct 28 12:22:11 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 28 Oct 2021 12:22:11 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 08:34:09 GMT, Tobias Hartmann wrote: > Just for the record, `LoopPercentProfileLimit` was always set to `30` on x86: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/5fefcbeda616#l6.7 > I am already working on optimizing post loop vectorization using vectorAPI masked operations and plan to enhance [SLP post loop ](https://bugs.openjdk.java.net/browse/JDK-8183390) after it. ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From simonis at openjdk.java.net Thu Oct 28 12:43:18 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 28 Oct 2021 12:43:18 GMT Subject: Integrated: JDK-8275865: Print deoptimization statistics in product builds In-Reply-To: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> References: <4XYU4uU8bqDF6dlLyJrYqpVIUXIiIOYnEC6sCZUPaB8=.40fc30be-43d6-45d0-a3cd-2e3d3ca6378e@github.com> Message-ID: On Mon, 25 Oct 2021 11:46:06 GMT, Volker Simonis wrote: > Deoptimization statistics are already gathered in product builds but for some (probably historical) reasons aren't printed to the VM/Compiler log. These statics can be useful when analyzing the reasons for deoptimization and frequent recompilations. > > Because the statistics are already collected anyway, printing them at VM-exit if either `-XX:+LogCompilation` or `-XX:+LogVMOutput` are set won't introduce any runtime overhead. This pull request has now been integrated. Changeset: a343fa87 Author: Volker Simonis URL: https://git.openjdk.java.net/jdk/commit/a343fa8766bb12188881319f06b1d93161cf1619 Stats: 86 lines in 2 files changed: 84 ins; 2 del; 0 mod 8275865: Print deoptimization statistics in product builds Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6103 From duke at openjdk.java.net Thu Oct 28 13:41:29 2021 From: duke at openjdk.java.net (Tobias Holenstein) Date: Thu, 28 Oct 2021 13:41:29 GMT Subject: RFR: JDK-8275909: [JVMCI] c2v_readFieldValue use long instead of jlong for the offset parameter Message-ID: Changed the type of the displacement in from `long` to `jlong` in C2V_VMENTRY_NULL(jobject, readFieldValue, (JNIEnv* env, jobject, jobject object, jobject expected_type, long displacement, jboolean is_volatile, jobject kind_object)) ------------- Commit messages: - JDK-8275909: [JVMCI] c2v_readFieldValue use long instead of jlong for the offset parameter Changes: https://git.openjdk.java.net/jdk/pull/6158/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6158&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275909 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6158.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6158/head:pull/6158 PR: https://git.openjdk.java.net/jdk/pull/6158 From thartmann at openjdk.java.net Thu Oct 28 14:48:22 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 28 Oct 2021 14:48:22 GMT Subject: RFR: 8275448: [REDO] AArch64: Implement string_compare intrinsic in SVE [v2] In-Reply-To: References: <3lLU_Cnsh439U5TjNx9ADxXkvMDnfot4TxHbh7paUQw=.5c7ec35a-8456-4fe6-8b4f-1d7627026d8d@github.com> Message-ID: On Thu, 28 Oct 2021 00:35:33 GMT, TatWai Chong wrote: >> After JDK-8269559 was integrated there are failures in tier1 testing >> across Mac OS X 11.4 (aarch64) machines. Please see JDK-8275263. >> >> This patch is NOT functional; rather, this tends to verify potential >> toolchain issues as the original patch pass testing on other >> platforms. >> >> In this patch, we remove new SVE-related matching rules and register >> class introduced in the original patch to minimally affect the >> non-SVE part. > > TatWai Chong has updated the pull request incrementally with one additional commit since the last revision: > > Add the register class and description for this SVE intrinsic. All tests passed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6072 From chagedorn at openjdk.java.net Thu Oct 28 15:08:32 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 28 Oct 2021 15:08:32 GMT Subject: RFR: 8275868: ciReplay: Inlining fails with "unloaded signature classes" due to wrong protection domains Message-ID: <16836GhLOzOFB_YHIIphRQbJicN3caZ5P4xOUmcUG5g=.5004acaf-acd1-4db3-9fe6-ed0e8eb9968f@github.com> Replay compilation can fail to inline a method which was inlined in the normal run due to unresolved classes in the signature of an inlinee. The reason is that ciReplay is not resolving Java API classes with the protection domain of the holder class of the method to be replay compiled. Compiler replay is currently only resolving classes without a protection domain (i.e. an empty handle): https://github.com/openjdk/jdk/blob/593401fe8b38bbb8d331a862818fe077af157fcb/src/hotspot/share/ci/ciReplay.cpp#L139-L142 A more detailed description can be found in the description of [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868). This patch fixes that and takes the protection domain of the holder class of the method to be compiled to resolve all other classes used for ciReplay. The unloaded classes check is done in `ciMethod::has_unloaded_classes_in_signature()` and bypasses the whitelist introduced by JDK-8262912. However, this is fine since the inlining decision is enforced by the inlining information in the replay file. To test the various scenarios mentioned in the description of JDK-8275868, I've added some support to use `DumpReplay` to not require a crash. I parse the inlining information from the hotspot log file to check that ciReplay applies the same inlining decisions as the normal run. Thanks, Christian ------------- Commit messages: - 8275868: ciReplay: Inlining fails with "unloaded signature classes" due to wrong protection domains Changes: https://git.openjdk.java.net/jdk/pull/6159/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6159&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275868 Stats: 438 lines in 5 files changed: 434 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6159.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6159/head:pull/6159 PR: https://git.openjdk.java.net/jdk/pull/6159 From chagedorn at openjdk.java.net Thu Oct 28 15:18:14 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 28 Oct 2021 15:18:14 GMT Subject: RFR: JDK-8275909: [JVMCI] c2v_readFieldValue use long instead of jlong for the offset parameter In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 13:33:55 GMT, Tobias Holenstein wrote: > Changed the type of the displacement in from `long` to `jlong` in C2V_VMENTRY_NULL(jobject, readFieldValue, (JNIEnv* env, jobject, jobject object, jobject expected_type, long displacement, jboolean is_volatile, jobject kind_object)) Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6158 From dnsimon at openjdk.java.net Thu Oct 28 15:18:15 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 28 Oct 2021 15:18:15 GMT Subject: RFR: JDK-8275909: [JVMCI] c2v_readFieldValue use long instead of jlong for the offset parameter In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 13:33:55 GMT, Tobias Holenstein wrote: > Changed the type of the displacement in from `long` to `jlong` in C2V_VMENTRY_NULL(jobject, readFieldValue, (JNIEnv* env, jobject, jobject object, jobject expected_type, long displacement, jboolean is_volatile, jobject kind_object)) Marked as reviewed by dnsimon (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6158 From duke at openjdk.java.net Thu Oct 28 15:23:10 2021 From: duke at openjdk.java.net (Tobias Holenstein) Date: Thu, 28 Oct 2021 15:23:10 GMT Subject: RFR: JDK-8275909: [JVMCI] c2v_readFieldValue use long instead of jlong for the offset parameter In-Reply-To: References: Message-ID: <-NqAIX7ppW-rGz3vi1iG_88iMYddLxetUP4almfFsI0=.00962f01-08bb-42db-8eb6-eab5295a60b9@github.com> On Thu, 28 Oct 2021 15:12:28 GMT, Christian Hagedorn wrote: >> Changed the type of the displacement in from `long` to `jlong` in C2V_VMENTRY_NULL(jobject, readFieldValue, (JNIEnv* env, jobject, jobject object, jobject expected_type, long displacement, jboolean is_volatile, jobject kind_object)) > > Looks good and trivial. @chhagedorn and @dougxc thanks for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6158 From iveresov at openjdk.java.net Thu Oct 28 16:01:21 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Thu, 28 Oct 2021 16:01:21 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled [v2] In-Reply-To: References: Message-ID: On Fri, 22 Oct 2021 08:12:27 GMT, SUN Guoyun wrote: >> Hi all, >> Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: >> >>

>> One or more @IR rules failed:
>> 
>> Failed IR Rules (1)
>> ------------------
>> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>>     - failOn: Graph contains forbidden nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>>         Matched forbidden node:
>>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>>     - counts: Graph contains wrong number of nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>>         Expected 1 but found 0 nodes.
>> 
>>>>> Check stdout for compilation output of the failed methods
>> 
>> >> This is a patch to fix this problem. Please help review it. >> >> Thanks, >> Sun Guoyun > > SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: > > 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled I think that's a pretty normal start of profiling in this configuration. We don't want to start profiling too early because it will have an adverse effect on startup. Historically, before tiered the profiling would start after 3300 invocations + back branches taken. I feels like the IR testing framework needs to be adjusted so that it warms the tests up longer (at least in this configuration). ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From iveresov at openjdk.java.net Thu Oct 28 19:34:27 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Thu, 28 Oct 2021 19:34:27 GMT Subject: RFR: 8274855: vectorapi tests failing with assert(!vbox->is_Phi()) failed Message-ID: We need to handle the case when the allocation input to `VectorBoxNode` is a phi but the vector input is not, which can definitely be the case if the vector input has been value-numbered. It seems to be safe to do by construction because `VectorBoxNode` and `VectorBoxAllocation` come in a specific order as a result of expanding an intrinsic call. After that, if any of the inputs to VectorBoxNode are value-numbered they can only move up and are guaranteed to dominate. ------------- Commit messages: - Remove tests from the problem list - Handle the case when the allocation input to a VectorBoxNode is a phi Changes: https://git.openjdk.java.net/jdk/pull/6162/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6162&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274855 Stats: 11 lines in 2 files changed: 8 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6162.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6162/head:pull/6162 PR: https://git.openjdk.java.net/jdk/pull/6162 From kvn at openjdk.java.net Thu Oct 28 20:19:14 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 28 Oct 2021 20:19:14 GMT Subject: RFR: 8274855: vectorapi tests failing with assert(!vbox->is_Phi()) failed In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 19:24:43 GMT, Igor Veresov wrote: > We need to handle the case when the allocation input to `VectorBoxNode` is a phi but the vector input is not, which can definitely be the case if the vector input has been value-numbered. It seems to be safe to do by construction because `VectorBoxNode` and `VectorBoxAllocation` come in a specific order as a result of expanding an intrinsic call. After that, if any of the inputs to VectorBoxNode are value-numbered they can only move up and are guaranteed to dominate. Seems fine. src/hotspot/share/opto/vector.cpp line 318: > 316: new_phi = C->initial_gvn()->transform(new_phi); > 317: return new_phi; > 318: } else if (vbox->is_Phi() && (vect->is_Vector() || vect->is_LoadVector())) { Please, add comment explaining this case. ------------- PR: https://git.openjdk.java.net/jdk/pull/6162 From kvn at openjdk.java.net Thu Oct 28 20:22:12 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 28 Oct 2021 20:22:12 GMT Subject: RFR: 8275868: ciReplay: Inlining fails with "unloaded signature classes" due to wrong protection domains In-Reply-To: <16836GhLOzOFB_YHIIphRQbJicN3caZ5P4xOUmcUG5g=.5004acaf-acd1-4db3-9fe6-ed0e8eb9968f@github.com> References: <16836GhLOzOFB_YHIIphRQbJicN3caZ5P4xOUmcUG5g=.5004acaf-acd1-4db3-9fe6-ed0e8eb9968f@github.com> Message-ID: On Thu, 28 Oct 2021 15:01:03 GMT, Christian Hagedorn wrote: > Replay compilation can fail to inline a method which was inlined in the normal run due to unresolved classes in the signature of an inlinee. The reason is that ciReplay is not resolving Java API classes with the protection domain of the holder class of the method to be replay compiled. Compiler replay is currently only resolving classes without a protection domain (i.e. an empty handle): > https://github.com/openjdk/jdk/blob/593401fe8b38bbb8d331a862818fe077af157fcb/src/hotspot/share/ci/ciReplay.cpp#L139-L142 > > A more detailed description can be found in the description of [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868). > > This patch fixes that and takes the protection domain of the holder class of the method to be compiled to resolve all other classes used for ciReplay. The unloaded classes check is done in `ciMethod::has_unloaded_classes_in_signature()` and bypasses the whitelist introduced by JDK-8262912. However, this is fine since the inlining decision is enforced by the inlining information in the replay file. > > To test the various scenarios mentioned in the description of JDK-8275868, I've added some support to use `DumpReplay` to not require a crash. I parse the inlining information from the hotspot log file to check that ciReplay applies the same inlining decisions as the normal run. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6159 From kvn at openjdk.java.net Thu Oct 28 20:25:23 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 28 Oct 2021 20:25:23 GMT Subject: RFR: 8273416: C2: assert(false) failed: bad AD file after JDK-8252372 with UseSSE={0,1} In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 10:10:08 GMT, Aleksey Shipilev wrote: > See the bug report for reproducer and failure message. I think the newly added `CastDD`/`CastFF` nodes should handle the extended `regDPR`/`regFPR` (which includes FPU "registers") instead of just `regD`/`regF` to avoid this mismatch error when `UseSSE < 2`. > > Unfortunately, we cannot just use `reg*PR` operands in existing match rules, because those operands are defined as `UseSSE < 2`, and using them as operands and `ideal_regs()` would break the matching on `UseSSE >= 2`. Therefore I had to add another pair of matches. > > Additonal testing: > - [x] Linux x86_32 `tier1` `-XX:UseAVX=0 -XX:UseSSE=0` > - [x] Linux x86_32 `tier1` default > - [x] Linux x86_64 `tier1` default Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5386 From iveresov at openjdk.java.net Thu Oct 28 20:25:47 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Thu, 28 Oct 2021 20:25:47 GMT Subject: RFR: 8274855: vectorapi tests failing with assert(!vbox->is_Phi()) failed [v2] In-Reply-To: References: Message-ID: > We need to handle the case when the allocation input to `VectorBoxNode` is a phi but the vector input is not, which can definitely be the case if the vector input has been value-numbered. It seems to be safe to do by construction because `VectorBoxNode` and `VectorBoxAllocation` come in a specific order as a result of expanding an intrinsic call. After that, if any of the inputs to VectorBoxNode are value-numbered they can only move up and are guaranteed to dominate. Igor Veresov has updated the pull request incrementally with two additional commits since the last revision: - Fix spelling. - Add comments. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6162/files - new: https://git.openjdk.java.net/jdk/pull/6162/files/1889bc60..fd4fbe19 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6162&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6162&range=00-01 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6162.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6162/head:pull/6162 PR: https://git.openjdk.java.net/jdk/pull/6162 From dlong at openjdk.java.net Thu Oct 28 20:35:14 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 28 Oct 2021 20:35:14 GMT Subject: RFR: 8275868: ciReplay: Inlining fails with "unloaded signature classes" due to wrong protection domains In-Reply-To: <16836GhLOzOFB_YHIIphRQbJicN3caZ5P4xOUmcUG5g=.5004acaf-acd1-4db3-9fe6-ed0e8eb9968f@github.com> References: <16836GhLOzOFB_YHIIphRQbJicN3caZ5P4xOUmcUG5g=.5004acaf-acd1-4db3-9fe6-ed0e8eb9968f@github.com> Message-ID: On Thu, 28 Oct 2021 15:01:03 GMT, Christian Hagedorn wrote: > Replay compilation can fail to inline a method which was inlined in the normal run due to unresolved classes in the signature of an inlinee. The reason is that ciReplay is not resolving Java API classes with the protection domain of the holder class of the method to be replay compiled. Compiler replay is currently only resolving classes without a protection domain (i.e. an empty handle): > https://github.com/openjdk/jdk/blob/593401fe8b38bbb8d331a862818fe077af157fcb/src/hotspot/share/ci/ciReplay.cpp#L139-L142 > > A more detailed description can be found in the description of [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868). > > This patch fixes that and takes the protection domain of the holder class of the method to be compiled to resolve all other classes used for ciReplay. The unloaded classes check is done in `ciMethod::has_unloaded_classes_in_signature()` and bypasses the whitelist introduced by JDK-8262912. However, this is fine since the inlining decision is enforced by the inlining information in the replay file. > > To test the various scenarios mentioned in the description of JDK-8275868, I've added some support to use `DumpReplay` to not require a crash. I parse the inlining information from the hotspot log file to check that ciReplay applies the same inlining decisions as the normal run. > > Thanks, > Christian Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6159 From kvn at openjdk.java.net Thu Oct 28 21:30:13 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 28 Oct 2021 21:30:13 GMT Subject: RFR: 8274855: vectorapi tests failing with assert(!vbox->is_Phi()) failed [v2] In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 20:25:47 GMT, Igor Veresov wrote: >> We need to handle the case when the allocation input to `VectorBoxNode` is a phi but the vector input is not, which can definitely be the case if the vector input has been value-numbered. It seems to be safe to do by construction because `VectorBoxNode` and `VectorBoxAllocation` come in a specific order as a result of expanding an intrinsic call. After that, if any of the inputs to VectorBoxNode are value-numbered they can only move up and are guaranteed to dominate. > > Igor Veresov has updated the pull request incrementally with two additional commits since the last revision: > > - Fix spelling. > - Add comments. Good ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6162 From pli at openjdk.java.net Fri Oct 29 02:10:11 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Fri, 29 Oct 2021 02:10:11 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 12:18:45 GMT, Jatin Bhateja wrote: >> Just for the record, `LoopPercentProfileLimit` was always set to `30` on x86: >> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/5fefcbeda616#l6.7 >> >> Or am I missing something? >> >> EDIT: Okay, I've seen your comment explaining the details only now. > >> Just for the record, `LoopPercentProfileLimit` was always set to `30` on x86: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/5fefcbeda616#l6.7 >> > I am already working on optimizing post loop vectorization using vectorAPI masked operations and plan to enhance [SLP post loop ](https://bugs.openjdk.java.net/browse/JDK-8183390) after it. @jatin-bhateja @TobiHartmann I have already done a patch to fix and re-enable post loop vectorization using masked operations. My patch fixes several issues and is fully tested. Now the post loop feature works on both x86 AVX-512 and AArch64 SVE. I haven't pushed my patch for review because the dependent patch (https://github.com/openjdk/jdk/pull/5873) has not been merged yet. ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From anhmdq at gmail.com Fri Oct 29 02:58:15 2021 From: anhmdq at gmail.com (=?UTF-8?Q?Qu=C3=A2n_Anh_Mai?=) Date: Fri, 29 Oct 2021 09:58:15 +0700 Subject: Ask for review: Optimise unsigned comparison pattern Message-ID: Hi, I have submitted a small patch that changes the patterns x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE to x u<=> y. This pattern is used in java.lang.Integer.compareUnsigned. I would be appreciated it if someone could take a look at it. Thank you very much. The pull request is here: https://github.com/openjdk/jdk/pull/6101. Regards, Qu?n Anh From jbhateja at openjdk.java.net Fri Oct 29 05:07:11 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 29 Oct 2021 05:07:11 GMT Subject: RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 12:18:45 GMT, Jatin Bhateja wrote: >> Just for the record, `LoopPercentProfileLimit` was always set to `30` on x86: >> http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/5fefcbeda616#l6.7 >> >> Or am I missing something? >> >> EDIT: Okay, I've seen your comment explaining the details only now. > >> Just for the record, `LoopPercentProfileLimit` was always set to `30` on x86: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/5fefcbeda616#l6.7 >> > I am already working on optimizing post loop vectorization using vectorAPI masked operations and plan to enhance [SLP post loop ](https://bugs.openjdk.java.net/browse/JDK-8183390) after it. > @jatin-bhateja @TobiHartmann I have already done a patch to fix and re-enable post loop vectorization using masked operations. My patch fixes several issues and is fully tested. Now the post loop feature works on both x86 AVX-512 and AArch64 SVE. I haven't pushed my patch for review because the dependent patch (#5873) has not been merged yet. Thanks for the head up @pfustc , I haven't started with the implementation yet. ------------- PR: https://git.openjdk.java.net/jdk/pull/6142 From shade at openjdk.java.net Fri Oct 29 06:17:21 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 29 Oct 2021 06:17:21 GMT Subject: RFR: JDK-8275909: [JVMCI] c2v_readFieldValue use long instead of jlong for the offset parameter In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 13:33:55 GMT, Tobias Holenstein wrote: > Changed the type of the displacement in from `long` to `jlong` in C2V_VMENTRY_NULL(jobject, readFieldValue, (JNIEnv* env, jobject, jobject object, jobject expected_type, long displacement, jboolean is_volatile, jobject kind_object)) Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6158 From duke at openjdk.java.net Fri Oct 29 06:20:13 2021 From: duke at openjdk.java.net (Tobias Holenstein) Date: Fri, 29 Oct 2021 06:20:13 GMT Subject: Integrated: JDK-8275909: [JVMCI] c2v_readFieldValue use long instead of jlong for the offset parameter In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 13:33:55 GMT, Tobias Holenstein wrote: > Changed the type of the displacement in from `long` to `jlong` in C2V_VMENTRY_NULL(jobject, readFieldValue, (JNIEnv* env, jobject, jobject object, jobject expected_type, long displacement, jboolean is_volatile, jobject kind_object)) This pull request has now been integrated. Changeset: e922023e Author: Tobias Holenstein Committer: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/e922023ec9a74e694a8180e678be19bc2720c346 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8275909: [JVMCI] c2v_readFieldValue use long instead of jlong for the offset parameter Reviewed-by: chagedorn, dnsimon, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/6158 From chagedorn at openjdk.java.net Fri Oct 29 07:28:12 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 29 Oct 2021 07:28:12 GMT Subject: RFR: 8275868: ciReplay: Inlining fails with "unloaded signature classes" due to wrong protection domains In-Reply-To: <16836GhLOzOFB_YHIIphRQbJicN3caZ5P4xOUmcUG5g=.5004acaf-acd1-4db3-9fe6-ed0e8eb9968f@github.com> References: <16836GhLOzOFB_YHIIphRQbJicN3caZ5P4xOUmcUG5g=.5004acaf-acd1-4db3-9fe6-ed0e8eb9968f@github.com> Message-ID: On Thu, 28 Oct 2021 15:01:03 GMT, Christian Hagedorn wrote: > Replay compilation can fail to inline a method which was inlined in the normal run due to unresolved classes in the signature of an inlinee. The reason is that ciReplay is not resolving Java API classes with the protection domain of the holder class of the method to be replay compiled. Compiler replay is currently only resolving classes without a protection domain (i.e. an empty handle): > https://github.com/openjdk/jdk/blob/593401fe8b38bbb8d331a862818fe077af157fcb/src/hotspot/share/ci/ciReplay.cpp#L139-L142 > > A more detailed description can be found in the description of [JDK-8275868](https://bugs.openjdk.java.net/browse/JDK-8275868). > > This patch fixes that and takes the protection domain of the holder class of the method to be compiled to resolve all other classes used for ciReplay. The unloaded classes check is done in `ciMethod::has_unloaded_classes_in_signature()` and bypasses the whitelist introduced by JDK-8262912. However, this is fine since the inlining decision is enforced by the inlining information in the replay file. > > To test the various scenarios mentioned in the description of JDK-8275868, I've added some support to use `DumpReplay` to not require a crash. I parse the inlining information from the hotspot log file to check that ciReplay applies the same inlining decisions as the normal run. > > Thanks, > Christian Thanks Vladimir and Dean for your reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/6159 From ngasson at openjdk.java.net Fri Oct 29 07:44:47 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Fri, 29 Oct 2021 07:44:47 GMT Subject: RFR: 8275847: Scheduling fails with "too many D-U pinch points" on small method [v2] In-Reply-To: <5z-HFwTvcqo_tge6dIMU4VZo-0UkInXAIKuh5D-fkxI=.b5f7a31b-717f-47b5-bfcc-90dbb223075e@github.com> References: <5z-HFwTvcqo_tge6dIMU4VZo-0UkInXAIKuh5D-fkxI=.b5f7a31b-717f-47b5-bfcc-90dbb223075e@github.com> Message-ID: <0VRE1Xz5B5o9M0DjdTd5KBL5YOXPcp8Od5vCpH96j34=.a0c38e4e-5a02-4adc-be8f-22c579f53d47@github.com> > Since around JDK 16 the following method cannot be compiled by C2 on AArch64: > > > public double mergeSync() { return Math.log(Math.sin(value)); } > > > (Reduced from a slightly larger benchmark.) > > > 811 416 ! 3 Test::mergeSync (61 bytes) > 813 417 ! 4 Test::mergeSync (61 bytes) > 816 417 ! 4 Test::mergeSync (61 bytes) COMPILE SKIPPED: too many D-U pinch points (retry at different tier) > 816 418 ! 1 Test::mergeSync (61 bytes) > > > Scheduling::anti_do_def() will create temporary Nodes for each OptoReg killed by the MachProjs from the two runtime leaf calls. After SVE support was added these runtime calls kill more registers, and the number of new Nodes added by anti_do_def exceeds an internal limit (which is based on the LRG map size and roughly proportional to the method size). > > X86 has the same problem if OptoScheduling is enabled because of the wide AVX registers. > > The fix here is to ignore OptoRegs which correspond to the high slots of wide vectors (i.e. slots above 64 bits). The scheduler doesn't run on methods where C->max_vector_size() > 8, so we know these kills can't affect the scheduling result. > > The added test fails on the current JDK with: > > > compiler.lib.ir_framework.shared.TestRunException: Could not compile public double > compiler.c2.irTests.TestScheduleSmallMethod.testSmallMethodTwoRuntimeCalls(double) at level C2 > after 10s. Last compilation level: 3 Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: Remove dead uses of is_concrete ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6131/files - new: https://git.openjdk.java.net/jdk/pull/6131/files/dfa783f1..d9875679 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6131&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6131&range=00-01 Stats: 15 lines in 2 files changed: 0 ins; 10 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/6131.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6131/head:pull/6131 PR: https://git.openjdk.java.net/jdk/pull/6131 From ngasson at openjdk.java.net Fri Oct 29 07:44:48 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Fri, 29 Oct 2021 07:44:48 GMT Subject: RFR: 8275847: Scheduling fails with "too many D-U pinch points" on small method [v2] In-Reply-To: References: <5z-HFwTvcqo_tge6dIMU4VZo-0UkInXAIKuh5D-fkxI=.b5f7a31b-717f-47b5-bfcc-90dbb223075e@github.com> Message-ID: On Thu, 28 Oct 2021 08:20:47 GMT, Tobias Hartmann wrote: > While looking at the usages of `is_concrete`, I found that all current usages outside of asserts are dead: > Looks like it's been like that since the initial public commit in 2007. I folded the `&& false`, `|| true` uses. ------------- PR: https://git.openjdk.java.net/jdk/pull/6131 From shade at openjdk.java.net Fri Oct 29 08:29:16 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 29 Oct 2021 08:29:16 GMT Subject: RFR: 8273416: C2: assert(false) failed: bad AD file after JDK-8252372 with UseSSE={0,1} In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 20:21:49 GMT, Vladimir Kozlov wrote: >> See the bug report for reproducer and failure message. I think the newly added `CastDD`/`CastFF` nodes should handle the extended `regDPR`/`regFPR` (which includes FPU "registers") instead of just `regD`/`regF` to avoid this mismatch error when `UseSSE < 2`. >> >> Unfortunately, we cannot just use `reg*PR` operands in existing match rules, because those operands are defined as `UseSSE < 2`, and using them as operands and `ideal_regs()` would break the matching on `UseSSE >= 2`. Therefore I had to add another pair of matches. >> >> Additonal testing: >> - [x] Linux x86_32 `tier1` `-XX:UseAVX=0 -XX:UseSSE=0` >> - [x] Linux x86_32 `tier1` default >> - [x] Linux x86_64 `tier1` default > > Good. Thanks @vnkozlov. @rwestrel, you good with this? ------------- PR: https://git.openjdk.java.net/jdk/pull/5386 From duke at openjdk.java.net Fri Oct 29 08:43:09 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 29 Oct 2021 08:43:09 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v3] In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 06:48:28 GMT, Tobias Hartmann wrote: > The problem with such large and non-targeted regression tests is that they won't work for long. Other changes to C2 and/or HotSpot will change timing, profile information, IR shape, optimization sequence or other factors such that the issue will not reproduce anymore with that test. Often, the test also does not reproduce the issue in older JDK versions that are affected as well. > > We therefore usually run `creduce --not-c` on our generated tests to simplify them (see [creduce](https://embed.cs.utah.edu/creduce/)). You might want to increase the number of loop iterations in the main method first and also add `-Xbatch`. Thank you for your review. The test is indeed not targeted. I will try to reduce the test to a simpler case. ------------- PR: https://git.openjdk.java.net/jdk/pull/6099 From duke at openjdk.java.net Fri Oct 29 08:50:10 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 29 Oct 2021 08:50:10 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v3] In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 08:42:25 GMT, Roland Westrelin wrote: > The optimization is valid AFAIU, so I don't think blocking it is the right fix. What about something like this: > > ``` > diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp > index 38b40a68b1f..b28a382ebe7 100644 > --- a/src/hotspot/share/opto/ifnode.cpp > +++ b/src/hotspot/share/opto/ifnode.cpp > @@ -1721,6 +1721,14 @@ Node* IfProjNode::Identity(PhaseGVN* phase) { > // will cause this node to be reprocessed once the dead branch is killed. > in(0)->outcnt() == 1))) { > // IfNode control > + if (in(0)->is_BaseCountedLoopEnd()) { > + Node* head = unique_ctrl_out(); > + if (head != NULL && head->is_BaseCountedLoop() && head->in(LoopNode::LoopBackControl) == this) { > + Node* new_head = new LoopNode(head->in(LoopNode::EntryControl), this); > + phase->is_IterGVN()->register_new_node_with_optimizer(new_head); > + phase->is_IterGVN()->replace_node(head, new_head); > + } > + } > return in(0)->in(0); > } > // no progress > ``` > > That prevents the crash and gives the loop another chance to be optimized. Replace the `CountedLoopNode` node with a new `Loop` node is more elegant, and will generate more optimized code as the second proposed version (which exchange the two if node), but your suggested code is more compact. I will change to your suggested version. ------------- PR: https://git.openjdk.java.net/jdk/pull/6099 From duke at openjdk.java.net Fri Oct 29 08:58:38 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 29 Oct 2021 08:58:38 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v4] In-Reply-To: References: Message-ID: > `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. > > For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. > image > > > In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. > image > > There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: > > diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp > index 38b40a6..31ff172 100644 > --- a/src/hotspot/share/opto/ifnode.cpp > +++ b/src/hotspot/share/opto/ifnode.cpp > @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { > } > } > > + if (is_LongCountedLoopEnd()) { > + set_req(0, dom->in(0)); > + set_req(1, dom->in(1)); > + dom->set_req(0, pre); > + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); > + Node* proj0 = raw_out(0); > + Node* proj1 = raw_out(1); > + Node* dom_proj0 = dom->raw_out(0); > + Node* dom_proj1 = dom->raw_out(1); > + dom_proj0->set_req(0, this); > + dom_proj1->set_req(0, this); > + proj0->set_req(0, dom); > + proj1->set_req(0, dom); > + } > + > if (bol->outcnt() == 0) { > igvn->remove_dead_node(bol); // Kill the BoolNode. > } > diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp > index 6f7e34d..7955722 100644 > --- a/src/hotspot/share/opto/loopnode.cpp > +++ b/src/hotspot/share/opto/loopnode.cpp > @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List > Node* back_control = head->in(LoopNode::LoopBackControl); > > // data nodes on back branch not supported > - if (back_control->outcnt() > 1) { > + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { > return false; > } ?? has updated the pull request incrementally with one additional commit since the last revision: Change to the version of Roland Westrelin ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6099/files - new: https://git.openjdk.java.net/jdk/pull/6099/files/606fb7e2..5498cfd9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6099&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6099&range=02-03 Stats: 11 lines in 1 file changed: 8 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6099.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6099/head:pull/6099 PR: https://git.openjdk.java.net/jdk/pull/6099 From roland at openjdk.java.net Fri Oct 29 09:32:16 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 29 Oct 2021 09:32:16 GMT Subject: RFR: 8273416: C2: assert(false) failed: bad AD file after JDK-8252372 with UseSSE={0,1} In-Reply-To: References: Message-ID: <9YO7f0X6xXeVEfSRrfvuX0G3el4u4UXnqoRMaoZPpJ4=.b892a615-4e8b-468e-bc30-fbd14aef65c0@github.com> On Tue, 7 Sep 2021 10:10:08 GMT, Aleksey Shipilev wrote: > See the bug report for reproducer and failure message. I think the newly added `CastDD`/`CastFF` nodes should handle the extended `regDPR`/`regFPR` (which includes FPU "registers") instead of just `regD`/`regF` to avoid this mismatch error when `UseSSE < 2`. > > Unfortunately, we cannot just use `reg*PR` operands in existing match rules, because those operands are defined as `UseSSE < 2`, and using them as operands and `ideal_regs()` would break the matching on `UseSSE >= 2`. Therefore I had to add another pair of matches. > > Additonal testing: > - [x] Linux x86_32 `tier1` `-XX:UseAVX=0 -XX:UseSSE=0` > - [x] Linux x86_32 `tier1` default > - [x] Linux x86_64 `tier1` default Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5386 From duke at openjdk.java.net Fri Oct 29 09:41:37 2021 From: duke at openjdk.java.net (SUN Guoyun) Date: Fri, 29 Oct 2021 09:41:37 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled [v3] In-Reply-To: References: Message-ID: <1J1d_whrKEmMGGYm9ZfhuiQo7pq677jppXGfvEZrkNA=.821ff6a1-4d51-4069-b256-a1ca917d2030@github.com> > Hi all, > Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: > >

> One or more @IR rules failed:
> 
> Failed IR Rules (1)
> ------------------
> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>     - failOn: Graph contains forbidden nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>         Matched forbidden node:
>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>     - counts: Graph contains wrong number of nodes:
>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>         Expected 1 but found 0 nodes.
> 
>>>> Check stdout for compilation output of the failed methods
> 
> > This is a patch to fix this problem. Please help review it. > > Thanks, > Sun Guoyun SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5903/files - new: https://git.openjdk.java.net/jdk/pull/5903/files/9ffdbeaf..5c652578 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5903&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5903&range=01-02 Stats: 12 lines in 1 file changed: 2 ins; 8 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5903.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5903/head:pull/5903 PR: https://git.openjdk.java.net/jdk/pull/5903 From duke at openjdk.java.net Fri Oct 29 09:48:33 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Fri, 29 Oct 2021 09:48:33 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern Message-ID: This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. Thank you very much. ------------- Commit messages: - add microbenchmark - add long unsigned comparison - use min_jint instead of TypeInt::MIN->get_con() - remove long optimisation - unsigned comparison optimisation Changes: https://git.openjdk.java.net/jdk/pull/6101/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6101&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276162 Stats: 94 lines in 2 files changed: 93 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6101.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6101/head:pull/6101 PR: https://git.openjdk.java.net/jdk/pull/6101 From jiefu at openjdk.java.net Fri Oct 29 09:48:34 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 29 Oct 2021 09:48:34 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern In-Reply-To: References: Message-ID: <4_s8MXQKKkfSSnxQ21Y5JMG9djaaT3AEq6BKVJFRJc8=.66c224ba-336a-4eee-97f3-a0be127a0983@github.com> On Mon, 25 Oct 2021 10:15:42 GMT, Mai ??ng Qu?n Anh wrote: > This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. > > In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. > > Thank you very much. It would be better if you can provide a micro benchmark to show us the performance improvement. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6101 From duke at openjdk.java.net Fri Oct 29 09:48:34 2021 From: duke at openjdk.java.net (Mai =?UTF-8?B?xJDhurduZw==?= =?UTF-8?B?IA==?= =?UTF-8?B?UXXDom4=?= Anh) Date: Fri, 29 Oct 2021 09:48:34 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern In-Reply-To: References: Message-ID: <8mMToQfY2jwnP4ASypeTpEGcS_RK4c0tQnkUe4uy9dc=.febcbe8f-109b-460b-97b7-0c7f8df8f078@github.com> On Mon, 25 Oct 2021 10:15:42 GMT, Mai ??ng Qu?n Anh wrote: > This patch changes operations in the form `x +- Integer.MIN_VALUE <=> y +- Integer.MIN_VALUE`, which is a pattern used to do unsigned comparisons, into `x u<=> y`. > > In addition to being basic operations, they may be utilised to implement range checks such as the methods in `jdk.internal.util.Preconditions`, or in places where the compiler cannot deduce the non-negativeness of the bound as in `java.util.ArrayList`. > > Thank you very much. I created a simple benchmark, the benchmark is run on Intel i7-7700HQ, the result is as follow: Before: Benchmark Mode Cnt Score Error Units App.runInt avgt 25 3.963 ? 0.181 ns/op App.runLong avgt 25 4.431 ? 0.101 ns/op After: Benchmark Mode Cnt Score Error Units App.runInt avgt 25 3.678 ? 0.192 ns/op App.runLong avgt 25 3.814 ? 0.085 ns/op This is the source code of the benchmark: package io.github.merykitty.simplebenchmark; import java.io.IOException; import java.util.concurrent.TimeUnit; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.infra.Blackhole; @BenchmarkMode(Mode.AverageTime) @State(Scope.Benchmark) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Warmup(iterations = 5) @Measurement(iterations = 5) public class App { @CompilerControl(CompilerControl.Mode.DONT_INLINE) public long test(int arg0, int arg1) { return arg0 + Integer.MIN_VALUE < arg1 + Integer.MIN_VALUE ? 1 : 0; } @CompilerControl(CompilerControl.Mode.DONT_INLINE) public long test(long arg0, long arg1) { return arg0 + Long.MIN_VALUE < arg1 + Long.MIN_VALUE ? 1 : 0; } @Benchmark public void runInt() { test(0, -1); test(-1, 0); } @Benchmark public void runLong() { test(0L, -1L); test(-1L, 0L); } public static void main( String[] args ) throws IOException { org.openjdk.jmh.Main.main(args); } } Do I need to add the benchmark to the patch? If yes then where should I put it in? Thank you very much. I have just pushed the microbenchmark, I am not sure what to put in the copyright line, though. The check failure seems to be due to this PR does not refer to an existing issue. Thank you very much. ------------- PR: https://git.openjdk.java.net/jdk/pull/6101 From jiefu at openjdk.java.net Fri Oct 29 09:48:34 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 29 Oct 2021 09:48:34 GMT Subject: RFR: 8276162: Optimise unsigned comparison pattern In-Reply-To: <8mMToQfY2jwnP4ASypeTpEGcS_RK4c0tQnkUe4uy9dc=.febcbe8f-109b-460b-97b7-0c7f8df8f078@github.com> References: <8mMToQfY2jwnP4ASypeTpEGcS_RK4c0tQnkUe4uy9dc=.febcbe8f-109b-460b-97b7-0c7f8df8f078@github.com> Message-ID: On Fri, 29 Oct 2021 07:23:56 GMT, Mai ??ng Qu?n Anh wrote: > I created a simple benchmark, the benchmark is run on Intel i7-7700HQ, the result is as follow: > > ``` > Before: > Benchmark Mode Cnt Score Error Units > App.runInt avgt 25 3.963 ? 0.181 ns/op > App.runLong avgt 25 4.431 ? 0.101 ns/op > > After: > Benchmark Mode Cnt Score Error Units > App.runInt avgt 25 3.678 ? 0.192 ns/op > App.runLong avgt 25 3.814 ? 0.085 ns/op > ``` > > This is the source code of the benchmark: > > ``` > package io.github.merykitty.simplebenchmark; > > import java.io.IOException; > import java.util.concurrent.TimeUnit; > > import org.openjdk.jmh.annotations.*; > import org.openjdk.jmh.infra.Blackhole; > > @BenchmarkMode(Mode.AverageTime) > @State(Scope.Benchmark) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Warmup(iterations = 5) > @Measurement(iterations = 5) > public class App { > @CompilerControl(CompilerControl.Mode.DONT_INLINE) > public long test(int arg0, int arg1) { > return arg0 + Integer.MIN_VALUE < arg1 + Integer.MIN_VALUE ? 1 : 0; > } > > @CompilerControl(CompilerControl.Mode.DONT_INLINE) > public long test(long arg0, long arg1) { > return arg0 + Long.MIN_VALUE < arg1 + Long.MIN_VALUE ? 1 : 0; > } > > @Benchmark > public void runInt() { > test(0, -1); > test(-1, 0); > } > > @Benchmark > public void runLong() { > test(0L, -1L); > test(-1L, 0L); > } > > public static void main( String[] args ) throws IOException { > org.openjdk.jmh.Main.main(args); > } > } > ``` > > Do I need to add the benchmark to the patch? If yes then where should I put it in? > > Thank you very much. I think you can put it under `test/micro/org/openjdk/bench/vm/compiler` with a more meaningful class name. Please note that the jcheck failed in your PR, which seems to prevent the RFR email from sending out. > I have just pushed the microbenchmark, I am not sure what to put in the copyright line, though. > > The check failure seems to be due to this PR does not refer to an existing issue. > > Thank you very much. Filed https://bugs.openjdk.java.net/browse/JDK-8276162 for you. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6101 From duke at openjdk.java.net Fri Oct 29 09:59:45 2021 From: duke at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 29 Oct 2021 09:59:45 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v5] In-Reply-To: References: Message-ID: <6TKcQDfEXQXmuLl0Wi1sZITmRwcEBzm25F7QDAWM150=.e65cd598-308a-4263-91b3-dc65522596e5@github.com> > `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. > > For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. > image > > > In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. > image > > There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: > > diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp > index 38b40a6..31ff172 100644 > --- a/src/hotspot/share/opto/ifnode.cpp > +++ b/src/hotspot/share/opto/ifnode.cpp > @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { > } > } > > + if (is_LongCountedLoopEnd()) { > + set_req(0, dom->in(0)); > + set_req(1, dom->in(1)); > + dom->set_req(0, pre); > + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); > + Node* proj0 = raw_out(0); > + Node* proj1 = raw_out(1); > + Node* dom_proj0 = dom->raw_out(0); > + Node* dom_proj1 = dom->raw_out(1); > + dom_proj0->set_req(0, this); > + dom_proj1->set_req(0, this); > + proj0->set_req(0, dom); > + proj1->set_req(0, dom); > + } > + > if (bol->outcnt() == 0) { > igvn->remove_dead_node(bol); // Kill the BoolNode. > } > diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp > index 6f7e34d..7955722 100644 > --- a/src/hotspot/share/opto/loopnode.cpp > +++ b/src/hotspot/share/opto/loopnode.cpp > @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List > Node* back_control = head->in(LoopNode::LoopBackControl); > > // data nodes on back branch not supported > - if (back_control->outcnt() > 1) { > + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { > return false; > } ?? has updated the pull request incrementally with one additional commit since the last revision: Simplify the test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6099/files - new: https://git.openjdk.java.net/jdk/pull/6099/files/5498cfd9..8c81c883 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6099&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6099&range=03-04 Stats: 210 lines in 1 file changed: 10 ins; 152 del; 48 mod Patch: https://git.openjdk.java.net/jdk/pull/6099.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6099/head:pull/6099 PR: https://git.openjdk.java.net/jdk/pull/6099 From shade at openjdk.java.net Fri Oct 29 10:17:20 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 29 Oct 2021 10:17:20 GMT Subject: RFR: 8276157: C2: Compiler stack overflow during escape analysis on Linux x86_32 Message-ID: See the bug for test details and analysis. I believe we just legitimately run out of stack in `fastdebug` builds. The fix is to increase the default stack size a bit. Linux-S390, Windows-x86/AArch64 seems to do a similar thing. I can do a similar change in `globals_bsd_x86.hpp`, but that would be a blind change, as I don't have platforms to verify that change sanity. I would prefer to make a Linux-specific fix at this time. Additional testing: - [x] Failing test now passes on Linux x86_32 - [ ] Linux x86_32 fastdebug `tier1` ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/6167/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6167&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276157 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6167.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6167/head:pull/6167 PR: https://git.openjdk.java.net/jdk/pull/6167 From chagedorn at openjdk.java.net Fri Oct 29 13:10:41 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 29 Oct 2021 13:10:41 GMT Subject: RFR: 8271056: C2: "assert(no_dead_loop) failed: dead loop detected" due to cmoving identity Message-ID: In the testcase, an unsafe cmoving identity is applied in `PhiNode::Identity()` after parsing which replaces a loop phi in a dead loop creating a dead data loop which triggers the assertion. The problem is that `PhiNode::Identity()` assumes that a cmoving identity is always safe because `PhiNode::Ideal()` handles unsafe cases and only leaves safe cases to `PhiNode::Identity()`: https://github.com/openjdk/jdk/blob/4c3491bfa5f296b80c56a37cb4fffd6497323ac2/src/hotspot/share/opto/cfgnode.cpp#L2051-L2055 However, the fix for [JDK-8268883 ](https://github.com/openjdk/jdk17/commit/6d8fc7249a3a1a2350c462f9c4fe38377856392f)added the following additional condition to wait for the region to be processed: https://github.com/openjdk/jdk/blob/4c3491bfa5f296b80c56a37cb4fffd6497323ac2/src/hotspot/share/opto/cfgnode.cpp#L2047-L2053 This skips the process of an unsafe case in `PhiNode::Ideal()` in the testcase. Afterwards, the unsafe case is replaced unconditionally in `PhiNode::Identity()` resulting in a dead data loop. I therefore propose to add the same check added in JDK-8268883 to `PhiNode::Identity()` to prevent that. Thanks, Christian ------------- Commit messages: - 8271056: C2: "assert(no_dead_loop) failed: dead loop detected" due to cmoving identity Changes: https://git.openjdk.java.net/jdk/pull/6172/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6172&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271056 Stats: 87 lines in 2 files changed: 85 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6172.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6172/head:pull/6172 PR: https://git.openjdk.java.net/jdk/pull/6172 From roland at openjdk.java.net Fri Oct 29 15:56:10 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 29 Oct 2021 15:56:10 GMT Subject: RFR: JDK-8275854: C2: assert(stride_con != 0) failed: missed some peephole opt [v5] In-Reply-To: <6TKcQDfEXQXmuLl0Wi1sZITmRwcEBzm25F7QDAWM150=.e65cd598-308a-4263-91b3-dc65522596e5@github.com> References: <6TKcQDfEXQXmuLl0Wi1sZITmRwcEBzm25F7QDAWM150=.e65cd598-308a-4263-91b3-dc65522596e5@github.com> Message-ID: On Fri, 29 Oct 2021 09:59:45 GMT, ?? wrote: >> `If subsume` optimization will eliminate `LongCountedLoopEndNode` node by mistake, which will lead to `PhaseIdealLoop` optimization crash. >> >> For example, the test of node 538 and node 553 will become the same after the first `PhaseIdealLoop` optimization. Node 555 is the back edge to the loop, and node 553 will be replaced by a `LongCountedLoopEndNode` node. >> image >> >> >> In the next `PhaseIdealLoop` optimization, node 538 find node 553 is redundant, and will subsume node 553. Then the `PhaseIdealLoop` optimization will crash, because there is no loop end node. >> image >> >> There are two way to fix the crash, the first is like the way in this pr, just exit `IFNode subsume` optimization when it's a `LongCountedLoopEndNode` node. The second possible fix is that exchange the dominating `IF` node with the `LongCountedLoopEndNode` node: >> >> diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp >> index 38b40a6..31ff172 100644 >> --- a/src/hotspot/share/opto/ifnode.cpp >> +++ b/src/hotspot/share/opto/ifnode.cpp >> @@ -1674,6 +1674,21 @@ Node* IfNode::simple_subsuming(PhaseIterGVN* igvn) { >> } >> } >> >> + if (is_LongCountedLoopEnd()) { >> + set_req(0, dom->in(0)); >> + set_req(1, dom->in(1)); >> + dom->set_req(0, pre); >> + dom->set_req(1, igvn->intcon(is_always_true ? 1 : 0)); >> + Node* proj0 = raw_out(0); >> + Node* proj1 = raw_out(1); >> + Node* dom_proj0 = dom->raw_out(0); >> + Node* dom_proj1 = dom->raw_out(1); >> + dom_proj0->set_req(0, this); >> + dom_proj1->set_req(0, this); >> + proj0->set_req(0, dom); >> + proj1->set_req(0, dom); >> + } >> + >> if (bol->outcnt() == 0) { >> igvn->remove_dead_node(bol); // Kill the BoolNode. >> } >> diff --git a/src/hotspot/share/opto/loopnode.cpp b/src/hotspot/share/opto/loopnode.cpp >> index 6f7e34d..7955722 100644 >> --- a/src/hotspot/share/opto/loopnode.cpp >> +++ b/src/hotspot/share/opto/loopnode.cpp >> @@ -802,7 +802,7 @@ bool PhaseIdealLoop::transform_long_counted_loop(IdealLoopTree* loop, Node_List >> Node* back_control = head->in(LoopNode::LoopBackControl); >> >> // data nodes on back branch not supported >> - if (back_control->outcnt() > 1) { >> + if (back_control->outcnt() > 1 || back_control->Opcode() != Op_IfTrue) { >> return false; >> } > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Simplify the test That looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6099 From shade at openjdk.java.net Fri Oct 29 17:18:27 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 29 Oct 2021 17:18:27 GMT Subject: RFR: 8276105: C2: Conv(D|F)2(I|L)Nodes::Ideal should handle rounding correctly Message-ID: Happens now in master: $ CONF=linux-x86-server-fastdebug make run-test TEST=compiler/loopopts/superword/CoLocatePack.java TEST_VM_OPTS="-XX:UseAVX=0 -XX:UseSSE=0" ... CompileCommand: compileonly compiler/loopopts/superword/CoLocatePack.test bool compileonly = true 191 ConvF2L === _ 714 [[ 193 ]] !jvms: CoLocatePack::test @ bci:30 (line 70) # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/phaseX.cpp:1128 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:1128), pid=1717516, tid=1717532 # fatal error: modified node was not processed by IGVN.transform_old() After JDK-8266950 (always `strictfp`), the paths in `Conv(D|F)2(I|L)Nodes::Ideal`-s start to be taken more frequently to round float/double inputs when low SSE is enabled. On those paths, we call `set_req` to rewire current node, but we still return `NULL` from `::Ideal`. I believe that is incorrect, as per `node.cpp` explanation: `NULL` indicates no graph change was done, and `this` should be returned when modification happened. So GVN predictably barfs. Additional testing: - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` (now pass) - [x] Linux x86_32 `tier1` default (still pass) - [ ] Linux x86_64 `tier1` default ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/6176/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6176&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276105 Stats: 16 lines in 1 file changed: 8 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/6176.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6176/head:pull/6176 PR: https://git.openjdk.java.net/jdk/pull/6176 From iveresov at openjdk.java.net Fri Oct 29 18:06:23 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 29 Oct 2021 18:06:23 GMT Subject: Integrated: 8274855: vectorapi tests failing with assert(!vbox->is_Phi()) failed In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 19:24:43 GMT, Igor Veresov wrote: > We need to handle the case when the allocation input to `VectorBoxNode` is a phi but the vector input is not, which can definitely be the case if the vector input has been value-numbered. It seems to be safe to do by construction because `VectorBoxNode` and `VectorBoxAllocation` come in a specific order as a result of expanding an intrinsic call. After that, if any of the inputs to VectorBoxNode are value-numbered they can only move up and are guaranteed to dominate. This pull request has now been integrated. Changeset: 5021a12c Author: Igor Veresov URL: https://git.openjdk.java.net/jdk/commit/5021a12ceada3192e81e2c06b556e7c80cd6cf31 Stats: 18 lines in 2 files changed: 15 ins; 3 del; 0 mod 8274855: vectorapi tests failing with assert(!vbox->is_Phi()) failed Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6162 From iveresov at openjdk.java.net Fri Oct 29 18:14:13 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 29 Oct 2021 18:14:13 GMT Subject: RFR: 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled [v3] In-Reply-To: <1J1d_whrKEmMGGYm9ZfhuiQo7pq677jppXGfvEZrkNA=.821ff6a1-4d51-4069-b256-a1ca917d2030@github.com> References: <1J1d_whrKEmMGGYm9ZfhuiQo7pq677jppXGfvEZrkNA=.821ff6a1-4d51-4069-b256-a1ca917d2030@github.com> Message-ID: On Fri, 29 Oct 2021 09:41:37 GMT, SUN Guoyun wrote: >> Hi all, >> Jtreg test case compiler/c2/irTests/TestPostParseCallDevirtualization.java fails for fastdebug mode on x86/aarch64/mips architecture when "--with-jvm-features=-compiler1" be used. the failed info is: >> >>

>> One or more @IR rules failed:
>> 
>> Failed IR Rules (1)
>> ------------------
>> - Method "public int compiler.c2.irTests.TestPostParseCallDevirtualization.testMethodHandleCallWithCCP() throws java.lang.Throwable":
>>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeBasic"}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"#PRE#(\\\\d+(\\\\s){2}(CallStaticJava.*)+(\\\\s){2}===.*#IS_REPLACED#)", "invokeStatic", "= 1"}, applyIfNot={})" 
>>     - failOn: Graph contains forbidden nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeBasic)
>>         Matched forbidden node:
>>           280  CallStaticJava  ===  5  6  7  8  1 ( 188  1  1  1  1  1  1 ) [[ 281  282  283  285 ]] # Static  java.lang.invoke.MethodHandle::invokeBasic
>>     - counts: Graph contains wrong number of nodes:
>>         Regex 1: (\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*invokeStatic)
>>         Expected 1 but found 0 nodes.
>> 
>>>>> Check stdout for compilation output of the failed methods
>> 
>> >> This is a patch to fix this problem. Please help review it. >> >> Thanks, >> Sun Guoyun > > SUN Guoyun has updated the pull request incrementally with one additional commit since the last revision: > > 8275086: compiler/c2/irTests/TestPostParseCallDevirtualization.java fails when compiler1 is disabled So, now 5000 warmup iterations help? You mentioned previously that it had to be 20000 ? What changed? ------------- PR: https://git.openjdk.java.net/jdk/pull/5903 From kvn at openjdk.java.net Fri Oct 29 19:23:11 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 29 Oct 2021 19:23:11 GMT Subject: RFR: 8276105: C2: Conv(D|F)2(I|L)Nodes::Ideal should handle rounding correctly In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 16:38:11 GMT, Aleksey Shipilev wrote: > Happens now in master: > > > $ CONF=linux-x86-server-fastdebug make run-test TEST=compiler/loopopts/superword/CoLocatePack.java TEST_VM_OPTS="-XX:UseAVX=0 -XX:UseSSE=0" > ... > > CompileCommand: compileonly compiler/loopopts/superword/CoLocatePack.test bool compileonly = true > 191 ConvF2L === _ 714 [[ 193 ]] !jvms: CoLocatePack::test @ bci:30 (line 70) > # To suppress the following error report, specify this argument > # after -XX: or in .hotspotrc: SuppressErrorAt=/phaseX.cpp:1128 > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:1128), pid=1717516, tid=1717532 > # fatal error: modified node was not processed by IGVN.transform_old() > > > After JDK-8266950 (always `strictfp`), the paths in `Conv(D|F)2(I|L)Nodes::Ideal`-s start to be taken more frequently to round float/double inputs when low SSE is enabled. On those paths, we call `set_req` to rewire current node, but we still return `NULL` from `::Ideal`. I believe that is incorrect, as per `node.cpp` explanation: `NULL` indicates no graph change was done, and `this` should be returned when modification happened. So GVN predictably barfs. > > Additional testing: > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` (now pass) > - [x] Linux x86_32 `tier1` default (still pass) > - [x] Linux x86_64 `tier1` default Correct. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6176 From kvn at openjdk.java.net Fri Oct 29 19:32:10 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 29 Oct 2021 19:32:10 GMT Subject: RFR: 8276157: C2: Compiler stack overflow during escape analysis on Linux x86_32 In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 10:06:26 GMT, Aleksey Shipilev wrote: > See the bug for test details and analysis. I believe we just legitimately run out of stack in `fastdebug` builds. The fix is to increase the default stack size a bit. Linux-S390, Windows-x86/AArch64 seems to do a similar thing. > > I can do a similar change in `globals_bsd_x86.hpp`, but that would be a blind change, as I don't have platforms to verify that change sanity. I would prefer to make a Linux-specific fix at this time. > > Additional testing: > - [x] Failing test now passes on Linux x86_32 > - [x] Linux x86_32 fastdebug `tier1` I consider this change as workaround. I am fine with it. Does EA find non-escaping allocations when the test passed (with bigger stack)? To actually fix the issue we would need to re-write recursive method `ConnectionGraph::find_inst_mem()` to normal method using `Node_Stack` or other C2's structures without recursion. Please, file RFE. May be also add check that it is not infinite recursion. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6167 From xliu at openjdk.java.net Fri Oct 29 23:16:21 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 29 Oct 2021 23:16:21 GMT Subject: RFR: 8276105: C2: Conv(D|F)2(I|L)Nodes::Ideal should handle rounding correctly In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 16:38:11 GMT, Aleksey Shipilev wrote: > Happens now in master: > > > $ CONF=linux-x86-server-fastdebug make run-test TEST=compiler/loopopts/superword/CoLocatePack.java TEST_VM_OPTS="-XX:UseAVX=0 -XX:UseSSE=0" > ... > > CompileCommand: compileonly compiler/loopopts/superword/CoLocatePack.test bool compileonly = true > 191 ConvF2L === _ 714 [[ 193 ]] !jvms: CoLocatePack::test @ bci:30 (line 70) > # To suppress the following error report, specify this argument > # after -XX: or in .hotspotrc: SuppressErrorAt=/phaseX.cpp:1128 > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/opto/phaseX.cpp:1128), pid=1717516, tid=1717532 > # fatal error: modified node was not processed by IGVN.transform_old() > > > After JDK-8266950 (always `strictfp`), the paths in `Conv(D|F)2(I|L)Nodes::Ideal`-s start to be taken more frequently to round float/double inputs when low SSE is enabled. On those paths, we call `set_req` to rewire current node, but we still return `NULL` from `::Ideal`. I believe that is incorrect, as per `node.cpp` explanation: `NULL` indicates no graph change was done, and `this` should be returned when modification happened. So GVN predictably barfs. > > Additional testing: > - [x] Linux x86_32 `tier1` with `-XX:UseAVX=0 -XX:UseSSE=0` (now pass) > - [x] Linux x86_32 `tier1` default (still pass) > - [x] Linux x86_64 `tier1` default LGTM. Prior code style in ConvD2INode::Ideal is obscure. This patch also makes it look better. ------------- PR: https://git.openjdk.java.net/jdk/pull/6176 From kvn at openjdk.java.net Fri Oct 29 23:49:14 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 29 Oct 2021 23:49:14 GMT Subject: RFR: 8271056: C2: "assert(no_dead_loop) failed: dead loop detected" due to cmoving identity In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 13:02:11 GMT, Christian Hagedorn wrote: > In the testcase, an unsafe cmoving identity is applied in `PhiNode::Identity()` after parsing which replaces a loop phi in a dead loop creating a dead data loop which triggers the assertion. The problem is that `PhiNode::Identity()` assumes that a cmoving identity is always safe because `PhiNode::Ideal()` handles unsafe cases and only leaves safe cases to `PhiNode::Identity()`: > https://github.com/openjdk/jdk/blob/4c3491bfa5f296b80c56a37cb4fffd6497323ac2/src/hotspot/share/opto/cfgnode.cpp#L2051-L2055 > > However, the fix for [JDK-8268883 ](https://github.com/openjdk/jdk17/commit/6d8fc7249a3a1a2350c462f9c4fe38377856392f)added the following additional condition to wait for the region to be processed: > https://github.com/openjdk/jdk/blob/4c3491bfa5f296b80c56a37cb4fffd6497323ac2/src/hotspot/share/opto/cfgnode.cpp#L2047-L2053 > > This skips the process of an unsafe case in `PhiNode::Ideal()` in the testcase. Afterwards, the unsafe case is replaced unconditionally in `PhiNode::Identity()` resulting in a dead data loop. > > I therefore propose to add the same check added in JDK-8268883 to `PhiNode::Identity()` to prevent that. > > Thanks, > Christian Looks good. Thanks! ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6172 From stuefe at openjdk.java.net Sat Oct 30 06:38:24 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 30 Oct 2021 06:38:24 GMT Subject: RFR: JDK-8276175: codestrings.validate_vm gtest still broken on ppc64 after JDK-8276046 Message-ID: Frustratingly, JDK-8276046 failed to work because the #ifdef PPC was added to the existing group of #ifdefs at the very start of test_codestrings.cpp. PPC is not a primary macro however, it gets set via macros.hpp if one of PPC32 or PPC64 is set. Therefore this only works after the inclusion of macro.hpp (Works for ZERO and PRODUCT, since those are primary macros). This fix revives my original fix using the DISABLED_... moniker on test functions. That seems to be the default way to disable tests anyway. ------------- Commit messages: - Use DISABLED_ prefix to switch off test on ppc Changes: https://git.openjdk.java.net/jdk/pull/6174/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6174&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276175 Stats: 8 lines in 1 file changed: 5 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6174.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6174/head:pull/6174 PR: https://git.openjdk.java.net/jdk/pull/6174